0

To be precise, I need to process files with a sample rate of 16kHz, 4 bytes/sample that can be 8 hours long, making files around 1.8GB in size.

All I need to do is read through the file to extract samples in chunks, to extract min and max values for plotting, but even reading the file, with no processing at all, takes over 10 seconds, with all of the time spent in the AudioFileReader.Read method. Fiddling with the buffer size changes this a little, but not enough to make it possible to do the sampling fast enough for my use case, which is displaying the waveform in the UI in (more or less) real time.

Is there a way to read such a large file more quickly than by reading the whole thing into memory a chunk at a time? Or some entirely different way to solve this problem?

Joshua Frank
  • 13,120
  • 11
  • 46
  • 95
  • A [previous question](https://stackoverflow.com/questions/2161895/reading-large-text-files-with-streams-in-c-sharp) could help you, but instead of `ReadLine()` use [`FileStream.Read`](https://learn.microsoft.com/en-us/dotnet/api/system.io.filestream.read) – Patrick Mar 26 '19 at 15:06
  • 2
    10 seconds for reading 1.8 GB (or eight hours) of data doesn't sound too bad. If all you're doing is peak visualization, maybe follow the lead of what some audio editors do and precalculate a smaller "sidecar" file (say `foo.wav.peaks`) with your peak data so you don't need to recalculate it over and over? – AKX Mar 26 '19 at 15:07
  • @Patrick: I'm using NAudio's `AudioFileReader.Read`, which is using a `FileStream` internally. So I am doing that, essentially. – Joshua Frank Mar 26 '19 at 15:20
  • 1
    @JoshuaFrank, then you need to use it differently. You don't have to do offset 0 to length of the file, you can do offset 0 and a length of your choice, and the stream will advance automatically for you. – Patrick Mar 26 '19 at 15:23
  • @AKX: That's exactly what I'm trying to do, but then my problem is that the file is changing all the time, so the peaks would be invalidated. I guess I could cache the peaks that haven't changed, and only compute the deltas, but it would be much easier if I could just read the whole file each time. But maybe I'm already near the fundamental speed limit here. – Joshua Frank Mar 26 '19 at 15:24
  • @JoshuaFrank By "changing all the time", I will go ahead and assume it's being appended to? :) In that case you could also just keep appending to your peaks file (or maybe better yet use a more structured format such as SQLite to make it easy to keep track of each "chunk"'s peaks)... – AKX Mar 26 '19 at 18:04
  • @AKX: You're right, it is append only, so this is a very good idea. I do still have to deal with large files, though, because someone could start recording and go for hours without a pause, and then even the chunks would be large. – Joshua Frank Mar 26 '19 at 19:34

0 Answers0