3

I'm trying to write a user-space PCI driver in DriverKit for educational/research purposes. I've found an example from WorthDoingBadly which has the boilerplate code for a PCI device dext (I've removed the exploit code).

I've modified it to match a Thunderbolt PCI NVMe device through the IOPCIPrimaryMatch key. I've been able to compile, sign, and load it with SIP disabled and systemextensionsctl developer on.

The problem arises when my device gets plugged in, and the Start function in my driver is called. I attempt to call ivars->pciDevice->Open(this, 0); on the device, which fails with 0xe00002cd "(iokit/common) device not open".

Meanwhile, I can see in the kernel logs, that the built-in NVMe driver is already initializing when my driver is called.

If I skip the call to Open and just call RegisterService();, I can see in IORegistryExplorer.app, that both the IONVMeController and my "PCICrash" are listed under the PCI device.

I speculate, that my driver would work, if I could keep the built-in NVMe driver from taking the device. Is this possible somehow?

For reference, my Info.plist looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>IOKitPersonalities</key>
    <dict>
        <key>PCICrash</key>
        <dict>
            <key>IOClass</key>
            <string>IOUserService</string>
            <key>IOProviderClass</key>
            <string>IOPCIDevice</string>
            <key>IOUserClass</key>
            <string>PCICrash</string>
            <key>IOUserServerName</key>
            <string>$(PRODUCT_BUNDLE_IDENTIFIER)</string>
            <key>IOPCIPrimaryMatch</key>
            <string>0x25228086</string>
            <key>IOPCITunnelCompatible</key>
            <true/>
        </dict>
    </dict>
</dict>
</plist>

This is the PCICrash.entitlements for the driver:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>com.apple.security.app-sandbox</key>
    <true/>
    <key>com.apple.developer.driverkit</key>
    <true/>
    <key>com.apple.developer.driverkit.transport.pci</key>
    <true/>
    <key>com.apple.developer.driverkit.transport.pci.bridge</key>
    <true/>
    <key>com.apple.developer.driverkit.allow-any-userclient-access</key>
    <true/>
</dict>
</plist>

This is the PCICrashApp.entitlements for the app:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>com.apple.security.app-sandbox</key>
    <true/>
    <key>com.apple.security.files.user-selected.read-only</key>
    <true/>
    <key>com.apple.developer.system-extension.install</key>
    <true/>
</dict>
</plist>

I build in Xcode without signing (the same as on WorthDoingBadly's setup), and then run the following:

$ codesign -s - -f --entitlements "PCICrash/PCICrash.entitlements" "[...]/PCICrashApp.app/Contents/Library/SystemExtensions/com.worthdoingbadly.PCICrashApp.PCICrash.dext"
$ codesign -s - -f --entitlements "PCICrashApp/PCICrashApp.entitlements" "[...]/PCICrashApp.app"
$ systemextensionsctl reset
$ [...]/PCICrashApp.app/Contents/MacOS/PCICrashApp
2023-05-08 18:28:17.810 PCICrashApp[3438:81755] requestNeedsUserApproval
2023-05-08 18:28:23.152 PCICrashApp[3438:81755] didFinishWithResult: 0

A view of IORegistryExplorer. It starts out with white text, upon loading the app and registering the extension, it resets the device, and reattaches it using the stock NVMe driver. Maybe this could be explained by a crashing driver. enter image description here

Mads Y
  • 342
  • 1
  • 4
  • 12
  • I can understand from pmdj's answer here (https://stackoverflow.com/a/75470680/1796585), that the IOMatchCategory should be omitted. But then my dext isn't matched at all, and is not mentioned in the kernel log, when the device is plugged in. – Mads Y May 05 '23 at 16:05
  • 1
    You definitely don’t want a match category here. Are you sure your `Start()` method isn’t simply failing, so your driver gets unloaded again? What does the system log say? Do you have any logging in `Start`? – pmdj May 05 '23 at 19:25
  • I can see my log statements from `Start()` in the kernel logs. All references to "PCICrash" disappears when IOMatchCategory is removed. I've tested it several times in a row. – Mads Y May 05 '23 at 19:57
  • I do notice, that every time I replace the extension, even without IOMatchCategory, I can see the device gets reset in IORegistryExplorer. This only works as long as the IOPCIPrimaryMatch is matching the device. So it's doing something to the device. Also, I found that `kernelmanagerd` still logs "Received kext load notification: com.worth..." – Mads Y May 05 '23 at 20:13
  • 1
    A few more things to check: 1. `IOResourceMatch` isn't necessary when not matching `IOResources` (or `IOUserResources`). I don't think it should cause problems, but better remove it. 2. Are you sure your extension isn't crashing? Any related crash reports in Console.log? 3. Are you on arm64 or Intel? [There are issues with dexts running with SIP disabled on arm64.](https://stackoverflow.com/q/65970643/48660) – pmdj May 06 '23 at 08:43
  • 1
    4. How are you checking the system log for dext-related notifications? I recommend something like this: `log stream --info --debug --predicate '(processID == 0 AND sender != "IOAcceleratorFamily2") OR process == "sysextd" OR process == "taskgated-helper" OR process == "syspolicyd" OR process == "kernelmanagerd" OR process == "amfid"'` as this will include code signing and entitlements related messages. – pmdj May 06 '23 at 08:43
  • 1
    5. Kernel and dext logging is unreliable on macOS. If you're not seeing kernel messages in `log stream`, reboot. Dexts will sometimes simply not log to the system log, and you need to re-plug the device to get log output. 6. If you see anything related to code signing and entitlements of the dext in the system log, add your dext's entitlements to the question and explain your code-signing/provisioning setup. – pmdj May 06 '23 at 08:49
  • 1
    7. One more thing: if you remove the `IOMatchCategory` key and plug in the device, what does the I/O Registry subtree for the device look like? You say your driver doesn't end up loaded, but does it fall back to the stock NVMe driver? Or does the device just end up with no driver at all? If it's the latter, I strongly suspect your driver might be loading correctly but ends up crashing. – pmdj May 06 '23 at 10:14
  • 1. I removed it now, but to the same result. 2. Not 100%, but there is nothing in Console.app or through `log stream` either. But I agree, that's a possibility. 3. I'm on arm64 primarily. I just tried Intel too, but it acts exactly the same. I tried arm64 with amfi disabled too, but no difference. – Mads Y May 08 '23 at 16:31
  • 4. I'm running something similar, but also tried yours. The only errors I see is something like this during installation: `sysextd: (Security) [com.apple.securityd:cfloadfile] failed to fetch /Users/....PCICrash.dext/_CodeSignature/CodeRequirements-1 error=-10`. But seemingly it continues the install and ends up "activated enabled". 5. I have tried rebooting and re-plugging the device, but same result. 6. As per 4. I do see some. I'll edit my question to include this. 7. It ends up with the stock NVMe driver. – Mads Y May 08 '23 at 16:32
  • I also tried a different SSD today, and a eGPU dock rather than an m.2 enclosure. Same result. Do you have a functional PCI driver, that you could share? Otherwise, I might try to clean this up, and put it on GitHub for reference. Could I be missing an entrypoint in the IOService? I have init, free, Start, Stop and NewUserClient. All are triggered when IOMatchCategory is included. – Mads Y May 08 '23 at 16:39
  • @pmdj I just had a breakthrough! For IOPCIPrimaryMatch, I also added the device-id of the bridge that comes before the actual PCI Device in IORegistryExplorer, and it now loads the driver! I'll investigate further, and report back if this was the solution. I likely have to make two separate drivers (bridge+NVMe). Thank you so much for your help so far! – Mads Y May 08 '23 at 20:02
  • For reference, the idea came from here. Wonder if I can access the underlying device still https://github.com/apple-oss-distributions/IOPCIFamily/blob/0b4c82fe7eaff74091b414225e966f993bfab328/PEX8733/PCIDriverKitPEX8733/Info.plist – Mads Y May 08 '23 at 20:05
  • Just to make it clear, the driver is attaching to the bridge. It doesn't expose the NVMe device. I need to figure out if this is the right approach. – Mads Y May 08 '23 at 21:59
  • 1
    Sounds like it shouldn’t be a code signing or binary architecture iissue then, so perhaps it really is a matching issue. You could try the `IOProbeScore` key in your match dictionary to give your dext a higher priority. – pmdj May 09 '23 at 05:38
  • 1
    Wait, was it really that simple? I just added the `IOProbeScore` with `65500` and it matched to the NVMe device immediately.. I owe you big time! Thank you again for all your help @pmdj! – Mads Y May 09 '23 at 06:59
  • 1
    Apologies, I forgot PCI matching doesn't have the same automatic probe score ranking that USB matching has, where if your match dictionary uses a specific vendor/product ID, you automatically get priority over a driver that just matches a generic product class. So yes, it really is just that. On my system, the generic NVMe driver seems to default to a probe score of 100 (`0x64`) so it's enough to set this to 1000 or something like that. – pmdj May 09 '23 at 12:12
  • No need to apologize. I had dismissed it myself, as I thought it was a deprecated thing from IOKit, since it isn't used in any DriverKit examples. – Mads Y May 09 '23 at 14:24
  • 1
    I didn't think of it as I've never needed to use it in any of the drivers I've shipped, so I guess it's fairly rare. If I was competing with a generic driver it's always been USB, my PCI drivers have always been for devices with no built-in support in macOS. Live and learn I guess. – pmdj May 09 '23 at 14:29

1 Answers1

3

After a lot of back and forth in the comments, we established that the problem was simply one of probe score.

I/O Kit matching uses a numeric probe score to resolve matching conflicts. This is specifically intended so that device- or vendor-specific drivers can be given a higher probe score to be prioritised over generic, vendor-independent drivers for class-compliant devices. That is exactly the situation we have here.

USB Matching has a built-in mechanism that boosts the probe score depending on the type of matching pattern, so vendor/device-specific drivers automatically get priority without having to specify an explicit probe score.

The PCI device matching logic on the other hand treats IOPCIClassMatch based matching dictionaries with the same priority as IOPCIMatch, IOPCIPrimaryMatch, and IOPCISecondaryMatch patterns by default, so as we found out here, you can end up with a tie.

The solution is to include an explicit IOProbeScore in the match dictionary which exceeds the generic driver's. The probe score gets attached to the "winning" driver as a property, so in this case the IONVMeController node has a IOProbeScore property with a value of 100 (0x64), so as long as you exceed that, your driver should win, e.g.:

            <key>IOProbeScore</key>
            <integer>1000</integer>

So for winning out against generic (AHCI, XHCI, NVMe, etc.) PCI drivers, you'll probably have to set a probe score in your device-specific driver. As I mentioned, USB has its own automatic mechanism, so I don't in general recommend setting an explicit probe score for USB drivers.

pmdj
  • 22,018
  • 3
  • 52
  • 103