How we Solved the Recurrent Crash of an Embedded Android System Based on a SoC RK3066 Processor
A Montreal-based company that distributes independent systems came to see us two weeks ago with an urgent problem with an Android system based on SoC RK3066. They noticed that the devices would sometime freeze during the start sequence and that the only solution was to have the device disconnected/reconnected by a technician called on site. Since the devices were installed over great distances and had to be restarted frequently, this solution was simply not a viable option.
Spiria first put in place a test bench in order to reproduce the problem. Verdict – we are dealing with a severe failure:
- No possibility of accessing a console (adb shell) or the Android event calendars (adb logcat)
- Video memory also seems corrupted (visual artefacts on the screen)
To get more information, we opened the case, looking for a direct access to the UART (Universal Asynchronous Receiver/Transmitter) of the core.
Once located on the map, a FT232 chip allowed us to convert the UART’s TTL (Transistor-Transistor Logic) signals into readable signals on a PC’s serial port (RS232).
We then noticed that the core itself was frozen and that no error report was produced.
- Hardware bug (bus arbitration / IRQ controller…?)
- Low-level software bug (deadlock/concurrency under high-priority IT, FIQ type?)
- Corrupted memory, from application standpoint
Given the gravity of the issue and the lack of availability of the system’s source code, we tried to find and activate a “Watchdog” hardware. A watchdog is a peripheral almost always present in a SoC (System-on-Chip); however, it is often deactivated during production.
When activated, it helps carry out a hardware reset automatically if the software stops showing any sign of life.
Implementing the Watchdog
- Studying Rockchip’s SoC RK3066 (Dual Core ARM Cortex-A9)
- Identifying and configuring CUR/PMU/WDT hardware blocks on the APB in order to activate the Watchdog IP
- Reverse engineering + document found in extremis on the web since Rockchip was a little reticent to provide this type of information
- Creating a driver for Android 4.1.1 (linux uname -r: 3.0.8)
- Creating a generic app to access the APB’s registries from a user space
- Creating a Java system service for the Watchdog’s keep-alive from Android to apply a corrective measure
- • Creating a solution for a complete Depack/Repack of ROM Android/RK image
Please note that here, we did not require an Android source code and we did not recompile the kernel.
Using the ROM provided directly by the client, we were able to provide a new ROM containing:
- The new driver
- A tool accessible from the console
- An adjusted configuration (init.rc) for the system initialization
- The new Android system service for the keep-alive
A test bench was carried out to measure the result: Before/After.
These tests carried out over several days showed that the new ROM image delivered does not require a person to activate the displays when they freeze (the automatic reboot takes place in less than 2 minutes).