Magic Trackpad 2 causes kernel heap corruption when passed to a Proxmox guest, GPFs

In mid-December I rebooted to upgrade my Proxmox kernel to pve-kernel-5.4.78-2-pve, but I immediately started having an issue where the kernel would trigger a GPF (general protection fault) and reset about 5-20 minutes after starting my macOS VM. I suspected that the new kernel was at fault, but I rolled back to the previous kernel and the problem persisted. I hadn’t experienced this fault before so I was a bit baffled about what change I made before that reboot could have triggered it.

To track down the issue, I built a version of Proxmox’s kernel with KASAN enabled. KASAN is the Kernel Address Sanitiser, it can detect kernel bugs like double-frees or out-of-bounds reads and writes by instrumenting the kernel to add checks around every memory access. This adds a bunch of CPU and memory space overhead, but the impact is bearable so long as your guest doesn’t need much service from the host kernel.

Proxmox’s pve-kernel repository can be found here:

https://git.proxmox.com/?p=pve-kernel.git;a=summary

Enabling KASAN just requires adding some kernel config parameters to the debian/rules file, like so:

https://github.com/thenickdude/pve-kernel/commit/ed67c2118a32efdcaa27c877e5115e2a08f0591b

Note that I had to manually run “git fetch –all” in the pve-kernel/submodules/ubuntu-focal directory, because the kernel commit that pve-kernel is based on was only found within a tag in that repo, and tags aren’t fetched by default.

The end result was a set of debs I installed on Proxmox to replace the current kernel.

And then for a month, silence… I didn’t have any memory errors detected by KASAN, and the kernel didn’t crash.

But today, I rebooted the system to boot up a Linux VM with passthrough, and finally KASAN filled the console with a log:

proxmox kernel: [  110.491997] BUG: KASAN: double-free or invalid-free in hid_free_buffers.isra.14+0x14a/0x290 [usbhid]
proxmox kernel: [  110.493893]
proxmox kernel: [  110.495737] CPU: 9 PID: 20045 Comm: task UPID:proxm Tainted: P    B      O      5.4.78-2-pve #1
proxmox kernel: [  110.499519] Call Trace:
proxmox kernel: [  110.503340]  print_address_description.constprop.6+0x20/0x220
proxmox kernel: [  110.507194]  kasan_report_invalid_free+0x69/0xb0
proxmox kernel: [  110.510997]  __kasan_slab_free+0x169/0x180
proxmox kernel: [  110.514842]  kasan_slab_free+0xe/0x10
proxmox kernel: [  110.518658]  hid_free_buffers.isra.14+0x14a/0x290 [usbhid]
proxmox kernel: [  110.522507]  hid_device_remove+0xce/0x200 [hid]
proxmox kernel: [  110.526371]  ? klist_put+0xcf/0x120
proxmox kernel: [  110.530249]  bus_remove_device+0x292/0x540
proxmox kernel: [  110.534120]  ? usb_hcd_flush_endpoint+0x70/0x3b0
proxmox kernel: [  110.538027]  ? __kasan_check_write+0x14/0x20
proxmox kernel: [  110.541934]  ? _raw_spin_lock+0xd0/0xd0
proxmox kernel: [  110.545801]  usbhid_disconnect+0xa7/0xd0 [usbhid]
proxmox kernel: [  110.549656]  ? rpm_idle+0x302/0x730
proxmox kernel: [  110.553522]  ? klist_put+0xcf/0x120
proxmox kernel: [  110.557381]  bus_remove_device+0x292/0x540
proxmox kernel: [  110.561222]  ? kobject_put+0x197/0x430
proxmox kernel: [  110.565031]  ? usb_remove_ep_devs+0x3c/0x80
proxmox kernel: [  110.568892]  usb_disable_device+0x19e/0x4d0
proxmox kernel: [  110.572741]  usb_disconnect+0x1f9/0x820
proxmox kernel: [  110.576529]  ? _raw_spin_lock+0xd0/0xd0
proxmox kernel: [  110.580601]  ? usb_hc_died+0x2d6/0x2d6
proxmox kernel: [  110.584344]  ? usb_hub_create_port_device.cold.9+0x19/0x19
proxmox kernel: [  110.588027]  ehci_pci_remove+0x1a/0x20 [ehci_pci]
proxmox kernel: [  110.591635]  ? pcibios_free_irq+0x10/0x10
proxmox kernel: [  110.595159]  device_release_driver_internal+0x1e0/0x4d0
proxmox kernel: [  110.598610]  unbind_store+0x19b/0x210
proxmox kernel: [  110.602017]  ? sysfs_kf_bin_read+0x2d0/0x2d0
proxmox kernel: [  110.605380]  ? drv_attr_show+0xa0/0xa0
proxmox kernel: [  110.608743]  kernfs_fop_write+0x223/0x410
proxmox kernel: [  110.612068]  ? _cond_resched+0x19/0x30
proxmox kernel: [  110.615311]  ksys_write+0x104/0x220
proxmox kernel: [  110.618514]  __x64_sys_write+0x73/0xb0

Searching for the faulting routine revealed this bug report:

https://bugzilla.kernel.org/show_bug.cgi?id=210241

And now I knew exactly why these symptoms were so intermittent, and why they had suddenly started in mid-December. My Magic Trackpad 2 is normally connected via Bluetooth, and my Proxmox doesn’t have any Bluetooth config set-up, so it normally never connects to Proxmox.

But when my trackpad battery went flat in December, and I plugged it into USB to charge, I had inadvertently set up half of the trigger condition for the fault. There was no issue until the next time I rebooted Proxmox. Then during boot Proxmox loaded the driver for the Magic Trackpad, since it was now connected to it by USB. Then when my guest started, it grabbed the USB controller using PCIe passthrough, so Proxmox disconnected the Magic Trackpad from its own drivers, corrupting the heap. That kernel heap corruption would cause a crash later on in some unrelated routine when it touched that part of the heap (multiple minutes later). But KASAN was able to pinpoint the corruption at the site where it occurred rather than at the victim location that crashed, saving the day.

As long as the trackpad was only plugged in while macOS was running, no fault would occur since the Linux driver for it would never be loaded.

Workaround

To solve this, create or append to the file /etc/modprobe.d/blacklist.conf, and add a line:

blacklist hid-magicmouse

Then reboot Proxmox, plug in the Magic Trackpad 2, and just confirm with “lsmod” that the hid-magicmouse module didn’t get loaded.

Workaround

Leave a Reply Cancel reply