Hypervisor-assisted Debugging

Working on a hypervisor has its own set of challenges. Among them, one of particular note is that of device emulation. This may be an exaggeration if you work in simulated environments such as ARM’s Fixed Virtual Platform. However, things can get out of hand once you switch to actual hardware.

Departure from Simulated Environments

A common component found on any embedded device is Power Management Integrated Circuit (PMIC) regulators. Their base function is relatively straightforward: managing the flow and direction of electrical power. They clamp input voltages/currents and power on the available devices. But things are more complex. In truth, every manufacturer uses myriad different regulators with at least as many features heaped on top for good measure. E.g., a particular device driver may implement a subset of the power regulator driver’s features and probe it directly via syscon to ascertain the state of a related subsystem. That subsystem could be a frequency divider that needs to be active before resetting the device.

A Typical Problem with Power Regulators

All this becomes a problem when the integration with an SoC is too inflexible. Power regulators (or their drivers) need to be developed with virtualization in mind. Say, for example, that you want to partition devices between several VMs, each VM having exclusive access to its assigned device. Since on Arm architectures, device trees are used instead of mechanisms such as ACPI Device Enumeration, the first logical step is to delete the unrequired device nodes from each VM’s device tree. Said and done. But now you realize that while one VM’s driver has powered on its device, the second VM’s different driver has shut it back down. Why? Because that particular driver was expecting to find the first VM’s device node in the device tree and assumes that its absence indicates that the device was disabled. So why not make sure that its power supply is also disabled? As you can imagine, problems such as these take time to track down. Moreover, hardware debugging solutions are sometimes not an option due to lackluster support for specific platforms (usually, at least OpenOCD tends to have better interfacing with hardware debuggers than OEM software). And this brings us back to the topic at hand.

Advantages of Hypervisor-assisted Debugging

If you’re at a point where not even ‘printk’ debugging is possible, your choices may seem limited. In fact, without a hardware debugger, the only option left is trapping the executing kernel into EL2 (hypervisor space), given that there is a hypervisor. This can be done in several ways:

Trapping memory accesses by not mapping the host physical address ranges to the guest physical address space. In essence, by using the Stage-2 Page Table feature in Arm processors (aka. Extended Page Tables – Intel, Nested Page Tables – AMD), we configure the MMU to perform two translations: one from guest virtual address space to guest physical address space, and another between the latter and host physical address space (i.e., the actual physical address). A failure during the first translation will be handled in EL1 (kernel space), but a failure during the second translation falls upon the hypervisor to resolve. This resolution involves performing the intended operation (according to the Exception Syndrome Register), incrementing the Program Counter, and returning to EL1. Between these steps, we can perform any debugging-related operation we desire.
Inserting trap instructions, which can be done either programmatically (like how software breakpoints work, i.e., replacing the first byte of the target instruction with an int 3 or brk depending on the arch) or manually, by inserting hvc or smc instructions in the kernel code. The advantage over the previous method is that it allows more targeted debugging with less overhead.
Hardware Debug Registers more or less achieve the same effect as the first method. Still, depending on your requirements, these may be useful for implementing single stepping and limiting memory access traps to instruction fetches, thus significantly speeding up the process.

While these building blocks are available to do some rudimentary debugging, features such as VM Introspection and DWARF symbol parsing are not easy to integrate. Nonetheless, more sophisticated tools are required. HyperDbg is an example of a hypervisor-assisted debugger that uses Intel’s VT-x and TSX extensions to provide a more robust debugging infrastructure, similar to what would be needed in an Arm ecosystem. Nonetheless, research on this topic is nuanced by persisting challenges in reverse engineering and malware analysis. The improvement of portability and ease of use take a backseat when faced with state-of-the-art code packers and protectors.

Radu Mantu
January 31, 2023

Any Questions?