Bypassing I/O scheduling and linux kernel page buffering

Question

What I wanna trying to accomplish:

Developing an linux application in C language, that "exclusively" accesses a

PATA/SATA hard disk drive(HDD) to send ATA commands(in fact only those ATA commands that do not modify any byte on HDD accessed - eg. READ_SECTOR, IDENTIFY_DEVICE, SET_FEATURES, etc.).

By "exclusively", I mean that as soon as HDD is powered on(a custom hardware-which is a simple on-off switch, ensures that HDD is not powered on till the application is loaded and desires to do so), the first and only access to that HDD is only to my application. IOWs except my application, not even the linux kernel(including SCSI sub-system) nor any other application or process or human user will ever be able to access that HDD, unless when my application instructs/permits them to do so.

There is another requirement on my application: As the access to HDD is quite critical(in terms of control and not in terms of performance) in our app., so it is not desired that any I/O schedular is involved in the transactions done by application(performance on this HDD is not a constraint.). Also it is not desired that the data read from HDD is buffered by kernel buffer or page buffer. The application will read in block size of 512 bytes or a multiple of it only.

Now the problem that I'm facing is:

The SCSI sub-system resides below(and written to work with) I/O scheduler and kernel buffer or page buffer cache.

Although 'sg-driver' is provided by the SCSI subsystem to directly send commands(- Linux SCSI sub-system commands, not ATA or SCSI commands directly - which are then translated by libata to actual ATA commands. Am I right here?) to the HDD,but that is a I/O approach - you give i/p and get o/p, i.e. you have no control over the process of data transfer protocol (eg PIO, DMA and ATA status and Error registers, etc.) and device configuration(via Set Features ATA command.).

Also the error reporting mechanism must be sound and is specific to ATA protocol and not simply Linux SCSI subsystem error codes. IOWs my application need to have access to ATA error register and ATA status register on PATA/SATA HDDs.

What my application demands is exclusive control of HDD - eg. issuing a READ_SECTOR ATA cmd and then retrieving the data itself from HDD directly via reading i/o ports or via 'libata' with the above requirements must be satisfied.

What I can't do ?

I'm not going to write a PATA/SATA HBA device driver or every HBA available in market, as those are already included in kernel for libata.

What I learned till now ?

To accomplish the desired task, I may(or may not?) need to write a block device driver that interacts directly with VFS layer(or is there any way to bypass VFS even, so that my app. can directly communicate with this block driver) w/o involving/messing with kernel buffer or page buffer and I/O scheduler. This block driver will communicate directly with libata(bypassing SCSI subsystem upper layer ), which then communicates with PATA/SATA HBA driver.

Is it possible to write such a driver in a cpu architecture independent way?

Is it a feasible approach? If yes then would it affect I/O performance of other HDDs attached that are not accessed by my app. in this way. Do I need to write a system call over VFS(or bypassing it if possible)in this case for my application to communicate wih my block driver? Please enlighten me regarding this approach.

or is it possible for my block device driver to directly communicate with PATA/SATA HBA driver written for libata, But again would this approach affect I/O performance of other HDDs attached that are not accessed by my app. in this way. Also how would my application communicate with this block device driver?

Please enlighten me.

Also I wanna know about the same scenario for my application, with one difference - instead of PATA/SATA drives what if I have SCSI hard drives and its variants - specifically, SAS, Fibre channel and USB. And of course this time i'll not be using libata and ATA commands, but SCSI protocol commands.

Would u like to suggest a live cd distro as my application host, that contains PATA/SATA HBA libata drivers(- not for IDE sub-system as I will not be using it for it is depreciated now and hence might not be updated with regards for HBA drivers.) for most HBAs.

In short, what is the most direct way for an linux application to access a PATA/SATA or SCSI/SAS/Fibre Channel HDD.

I hope, I have provided sufficient info regarding my question, but if u wanna get more info or more clarification, please feel free to ask.

Update1 (on 27/6/2012):
With the useful discussion with Chris(see below) and my research, I reached the following conclusions:

that a readymade USB-to-PATA/SATA adapter will not solve my purpose, becoz it does not allow my app. or driver to change the data transfer mode(PIO vs DMA) on-the-fly, it does not allow my app. or driver to read ATA registers.
that a custom made USB-to-PATA/SATA adapter might help, but that will require either a embedded processor which need to implement ATA protocol, or an FPGA chip that implements the whole ATA protocol. But the embedded processor solution involves GPIO and is not good for SATA as it will require specialized transceivers, and the I/O performance will be an issue for both PATA and SATA- too slow for my application.

Such a adapter will talk to my linux-kernel driver (or via libusb) to my app. by a custom protocol that helps in communication b/w my app. and ATA protocol on embedded processor. In case of FPGA chip solution, I need to implement this protocol in FPGA itself alongwith the ATA protocol.

But at this point it is infeasible in terms of labour, time and money for me to implement FPGA solution and embedded processor solution. So I'm stuck with software only solution.

Finally, it seems that I'm probably going to have to duplicate and modify everything down to the hardware inteface layer to meet by requirements as said by Chris.

So, between VFS layer and HBA driver or libata layer, exactly how should I proceed. What things need to be implemented and what not?

Can someone throw light on this issue? Any ideas??

Update2 (on 1/7/2012):
I'm struggling with the issue. Is there someone on SO, who can enlighten me?

Chris Stratton · Answer 1 · 2012-06-26T05:59:55.613

3

Realistically, if you want this level of detail control you are going to end up having to write your own low-level drivers.

Your constraint about avoiding I/O buffering and scheduling may be particularly challenging - you might avoid DMA, but a modern processor has its I/O rather decoupled from internal operations for performance reasons. Perhaps if you can fully disable all interrupts you might at least be able to timestamp when you do things. You will probably want the drive on it's own interface adapter, certainly not sharing with a running filesystem.

Doing things from userspace would probably require working by means of proxies in the kernel - you'll need to do your timing critical things on the kernel side though.

A far simpler solution, if it would meet your needs, would be to use a USB-to-SATA or PATA adapter. You can tell the existing kernel drivers to ignore the VID/PID using the quirks mode of modprobe, then talk to the device via libusb from userspace. However, there will certainly be latency there.

For the finest level of control, you probably need to connect the drive to an embedded processor which has no I/O buffering, or perhaps even an FPGA. This was not particularly hard to do for low data rates with PATA, SATA probably requires specialized transceivers but may not be out of the realm of possibility (or perhaps you can work through one of the adapters). You would probably end up connecting this custom peripheral to a PC via USB or even a serial port, and using that to issue tasks and obtain results (would be convenient if you set it up so the PC can download the device's firmware/bitstream, so you have flexibility).

edited Jun 26 '12 at 05:59

answered Jun 25 '12 at 19:42

Chris Stratton

39,853
6
84
117

1

Thanks for throwing light on alternative approaches. But, as I mentioned, I need to access the ATA Status and ATA Error registers of Hard Drive to report actual ATA status and error to my application, and with USB or any other interface except PATA/SATA interface, it wouldn't be possible, as far as I know. Also, to implement custom FPGA and Embedded processor is too costly for me. The data transfer(reads from hard drive) would also be quite difficult for me, as I have to implement full ATA protocol inside that FPGA or embedded processor. – jacks Jun 26 '12 at 02:58
1

At this point I understand that I need to write a low-level driver, but what I don't know is at what level this driver should reside in linux kernel and whether it should communicate with libata directly or with PATA/SATA HBA driver directly. And one more thing, how should my application talks to this driver bypassing VFS(if possible), kernel buffer and I/O scheduler. I have never written a linux driver in my life till now! – jacks Jun 26 '12 at 02:59
1

If the project is even possible on PC-type hardware, you should write it at the lowest possible level, ie the one which talks directly to the processor bus-to-PATA or SATA bridge device. – Chris Stratton Jun 26 '12 at 03:34
1

Would I then need to write PATA/SATA HBA driver, or can my driver use the existing PATA/SATA HBA driver for libata, or do my driver need to communicate with libata only, or do my driver need to communicate with linux PCI subsystem(for direct port I/O ?), or some other approach? – jacks Jun 26 '12 at 03:55
1

You are probably going to have to duplicate and modify everything down to the hardware inteface layer in order to meet some of your rather odd requirements; this is why you might find it easier to use a bare-metal embedded platform where the architecture is flatter, without so many layers of onion to peel. – Chris Stratton Jun 26 '12 at 04:24
1

So your are saying that I need to implement whole ATA protocol on FPGA chip or on a embeded processor(like ARM etc.?) and then this chip is to be placed on a PCI host adapter - in this case I'm creating my own PCI PATA/SATA HBA card! or the other approach is to place this chip on a PCB with usb interface or serial port interface to communicate with my PC application. – jacks Jun 26 '12 at 05:04
1

Ok, this will solve the "exclusive" access requirement, but then the question remains will now do I don't need to write a linux driver that bypasses any buffering and I/O scheduling done by kernel and provide data to my application running in linux user space? I don't think so. – jacks Jun 26 '12 at 05:06
1

Even if I write such a driver then how do I interface my application with this driver so that my requirements are satisfied. It seems that with custom hardware as you suggested, I still need to implement such a driver as it would be needed if I'll duplicate and modify everything down to the hardware interface layer in order to meet my requirements even w/o using the hardware. or Am I missing something here? – jacks Jun 26 '12 at 05:06
1

You place the interface above the point where your detail requirements (with respect to timing, interference, etc) are satisfied, so that you don't have to worry about the timing of the interface. I think your real problem is that you don't understand your requirements well enough in the context of how the system works. – Chris Stratton Jun 26 '12 at 05:08
1

My requirement is simple: my application will send a ATA command to the HDD attached to the PC and retrieve any data if any, from the disk. The constraint is if my application wanna read say LBA 80, then LBA2000 and then LBA90 then, first the request sent by the application must not be scheduled by kernel and second the data read from drive must bypass any buffering by kernel though my driver can buffer the data read. Meanwhile this HDD must not be exposed to any other process(from the instant it is powered on) unless my application mounts it. – jacks Jun 26 '12 at 05:43
1

As far as my understanding goes, I'm not a linux-kernel guru. The constraint of "no scheduling" is not due to latency issues but to provide one-to-one correspondence b/w application request and HDD read requests. eg. if app. issues a READ_SECTOR command on LBA4, then on LBA1024, and then on LBA12, then the data must be read from HDD in this same order as the order in which requests are issued. – jacks Jun 26 '12 at 05:49
1

The constraint of "no buffering by kernel" is also not due to latency issues but instead due to requirement of immediate read of desired LBA so that the application can take further decisions on the data or error received. – jacks Jun 26 '12 at 05:49
1

But what do you mean by "immediate"? And does your no-buffering requirement mean only that all of the requested operations must actually happen in the requested order, or do you literally mean that no buffering may take place when moving between asynchronous buses/domains/execution contexts? If the later, why??? – Chris Stratton Jun 26 '12 at 05:57
1

By "immediate" I mean that my driver receives the data from HDD in the order in which my application issues the ATA command to read a specific LBA - no optimizations here. It is then upto my driver to provide data read from HDD, to my app. in this same order. And yes, you got it right - no-buffering requirement mean only that all of the requested operations must actually happen on HDD in the requested order, and the data is receieved by my driver with least possible number of intermediate layers. BTW, what did you mean by "when moving between asynchronous buses/domains/execution contexts?" – jacks Jun 26 '12 at 06:06
1

Seems then like you might be able to do it from userspace with a custom program talking via libsub to a USB-to-whatever bridge, where the USB storage module has been told to ignore the VID/PID of that device. That way linux won't have any idea what the peripheral is, and your program can do whatever it wants with it. The catch would be if the USB bridge chip, or even the drive itself, implements any non-optional (or tricky-to-disable) caching. For PATA at least, if you don't care about data rate, doing it entirely by hand with some GPIO adapter is not actually all that bad. – Chris Stratton Jun 26 '12 at 06:09
1

Would it be possible then to retrieve data by selecting PIO or DMA - as and when desired, if this USA to PATA bridge is implemented via ARM or FPGA(I don't know much about FPGAs). Does libusb support USB2.0 high speed(480 Mbps) devices? SATA is also a concern for me (You said that 'SATA probably requires specialized transceivers'). Cost too is a consideration here. Are there any USB to PATA/SATA bridges already available that can be programmed for my purpose? – jacks Jun 26 '12 at 06:23
1

They are available and they are cheap; if they will meet your purpose or not will require you to do some research. – Chris Stratton Jun 26 '12 at 06:25
1

Thanks, I'll do some research about these bridges and whether any other solutions might exist - both software/driver or hardware approaches, and then I'll come back. – jacks Jun 26 '12 at 06:31

Bypassing I/O scheduling and linux kernel page buffering

1 Answers1