15

Background

My board incorporates an STM32 microcontroller with an SD/MMC card on SPI and samples analogue data at 48 ksamples/s. I am using the Keil Real-time Library RTX kernel, and ELM FatFs.

I have a high priority task that captures analogue data via DMA in blocks of 40 samples (40 x 16 bit); the data is passed via a queue of length 128 (which constitutes about 107 ms of sample buffering) to a second low priority task that collates sample blocks into a 2560 byte buffer (this being a multiple of both the 512 byte SD sector size and the 40 sample block size). when this buffer is full (32 blocks or approx 27 ms), the data is written to the file system.

Observation

By instrumenting the code, I can see that every 32 blocks, the data is written and that the write takes about 6 ms. This is sustained until (on FAT16) the file size gets to 1 MB, when the write operation takes 440 ms, by which time the queue fills and logging is aborted. If I format the card as FAT32, the file size before the 'long-write' event is 4 MB.

The fact that the file size at which this occurs changes between FAT16 and FAT32 suggests to me that it is not a limitation of the card but rather something that the file system does at the 1 MB or 4 MB boundaries that takes additional time.

It also appears that my tasks are being scheduled in a timely manner, and that the time is consumed in the ELM FatFs code only at the 1 MB (or 4 for FAT32) boundary.

The question

Is there an explanation or a solution? Is it a FAT issue, or rather specific to ELM's FatFs code perhaps?

I have considered using multiple files, but in my experience FAT does not handle large numbers of files in a single directory very well and this would simply fail also. Not using a file system at all and writing to the card raw would be a possibility, but ideally I'd like to read the data on a PC with standard drivers and no special software.

It occurred to me to try compiler optimisations to get the write-time down; this seems to have an effect, but the write times seemed much more variable. At -O2 I did get a 8 MB file, but the results were inconsistent. I am now not sure whether there is a direct correlation between the file size and the point at which it fails; I have seen it fail in this way at various file lengths on no particular boundary. Maybe it is a card performance issue.

I further instrumented the code and applied a divide an conquer approach. This observation probably renders the question obsolete and all previous observations are erroneous or red-herrings.

I finally narrowed it down to an instance a multi-sector write (CMD25) where occasionally the "wait ready" polling of the card takes 174 ms for the first three sectors out of a block of 5. The timeout for wait ready is set to 500 ms, so it would happily busy-wait for that long. Using CMD24 (single sector write) iteratively is much slower in the general case - 140 ms per sector - rather than just occasionally.

So it seems a behaviour of the card after all. I shall endeavour to try a range of cards SD and MMC.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Clifford
  • 88,407
  • 13
  • 85
  • 165
  • +1. I have zero clue about the topic heh, but an interesting and well written question :-) – richsage Jul 22 '10 at 10:38
  • For anyone still interested, I eventually found a Transcend 2GB SD card with sufficiently low latency to allow the data to be streamed successfully. So the answer is to get the right card - they are not all created equal. – Clifford Dec 11 '12 at 20:15

2 Answers2

5

The first thing to try could be quite easy: increase the queue depth to 640. That would give you 535 ms of buffering and should survive at least this particular file system event.

The second thing to look at is the configuration of the ELM FatFs. Many embedded file systems are very stingy with buffer usage by default. I've seen one that used a single 512 byte block buffer for all operations and it crawled for certain file system transactions. We gave it a couple of kilobytes and the thing became orders of magnitude faster.

Both of the above are dependent on whether you have more RAM available, of course.

A third option would be to preallocate a huge file and then just overwrite the data during data collection. That would eliminate a number of expensive cluster allocation and FAT manipulation operations.

Since compiler optimization affected this, you must also consider the possibility that it is a multi-threading issue. Are there other threads running that could disturb the lower priority reader thread? You should also try changing the buffering there to something other than a multiple of the sample size and flash block size in case you're hitting some kind of system resonance.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Amardeep AC9MF
  • 18,464
  • 5
  • 40
  • 50
  • Yes increasing the queue depth would be a solution - if only I have sufficient RAM to allow that! The part has 64Kb RAM, as 640x40x16bit queue would be 51Kb, and it is not the only thing running. I have increased it to 128 for this issue, I really need it to be much lower; a queue of 8 is sufficient until this extended write occurs. Tried option 3 already - no effect. Will look at option 2 and report. Thanks. – Clifford Jul 22 '10 at 10:43
  • Are you sure the #3 was done in a way that wouldn't require cluster reallocation? In other words, did you open the file for 'modify' instead of 'write'? Opening for 'write' would zero it and start cluster allocation all over again. – Amardeep AC9MF Jul 22 '10 at 10:51
  • w.r.t. option 2 the options are to use a sector buffer per file or a shared sector buffer. I am using the former, but have only one file open in any case. – Clifford Jul 22 '10 at 11:01
  • In that case I'd look into another FAT file system implementation. That one is going to be a real system bottleneck. – Amardeep AC9MF Jul 22 '10 at 11:15
  • For #3, yes, the file was always opened for update (which is why I erroneously thought it was *always* 1Mb - that was just the 'high-tide' mark. I have performed further tests which probably render all previous observations obsolete, I have added them to the original question. – Clifford Jul 22 '10 at 13:20
  • Re edit: I have moved off this problem now, but thinking about it now, there are other threads, and one of them could account for this. This is a team project, and the particular thread was written by someone else and I never really considered it to be a problem; it may be taking more time that I expected and it does run at a priority between the two. May revisit this; thanks, should of thought of that myself. – Clifford Jul 24 '10 at 09:21
  • Checked the 'third thread', it is never in the running state for more than 85 microseconds. So I am back to inherent card behaviour being the cause. – Clifford Jul 27 '10 at 09:26
1

You (or anyone else reading this question) could try this FAT library: https://github.com/fernando-rodriguez/fat32lib.

On a 40 MIPS Microchip dsPIC33 with a 10 Mbit/s SPI bus it can sample at 230 Ksps (16-bit) on any card I've tried.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
fernan
  • 349
  • 1
  • 6
  • 2
    As noted in the second update to the question, I don't think it was a software issue - the card goes into a busy state for a significant period on occasions which kills it in this application despite the *average* transfer rate typically sustaining over 300 kbyte/s. As noted in the comment, found a Transcend card that worked. However I was necessarily limited to SD (not SDHC) and an SPI interface; this code supports larger cards, so may be useful in any case, and larger SDHC cards may not exhibit this problem. There is no way I am going to put GPL3 code in my project however. – Clifford Dec 29 '13 at 19:44
  • 1
    That is usually because of a flash page erase. It may help if the file system is aligned with the SD card page size. I know the format utility on Windows 7 will do that for SDHC cards not sure about standard SD cards. On SD cards the page size can be obtained from the SECTOR_SIZE field on CSD register. If you have a way to make sure that your file is allocated at the start of a flash page it may also help. If the writes are not properly aligned a larger card may just make it worst as the page size is larger. – fernan Jan 08 '14 at 04:54
  • 1
    On SDHC and SDXC page size is the AU_SIZE field SD Status register and they're further divided into recording units (RUs) which are calculated based on the card capacity and speed class (SPEED_CLASS on SD Status register) and you need to write in chunks of RU size to avoid page erases. – fernan Jan 08 '14 at 05:03