6

I'm working with large files in C# (can be up to 20%-40% of available memory) and I will only need small parts of the files to be loaded into memory at a time (like 1-2% of the file). I was thinking that using a FileStream would be the best option, but idk. I will need to give a starting point (in bytes) and a length (in bytes) and copy that region into a byte[]. Access to the file might need to be shared between threads and will be at random spots in the file (non-linear access). I also need it to be fast.

The project already has unsafe methods, so feel free to suggest things from the more dangerous side of C#

joe_coolish
  • 7,201
  • 13
  • 64
  • 111
  • This recent post shows a working implementation. [Using a FileStream] http://stackoverflow.com/questions/5201414/having-a-problem-while-using-filestream-seek-in-c-solved/5201549#5201549 – Ritch Melton Mar 06 '11 at 04:25
  • JOC, how much are you going to be jumping around? I.E. how long are you going to hold 1-2% of the file in memory before loading a different 1-2%? FileStreams are fast, but you're still going to take a hit on disk access. You may want to consider some caching strategies, if you're able to predict what will need loaded. – James King Mar 06 '11 at 04:35
  • @Ritch Thanks! I'll take a look :) @James I'm going to be holding onto the data for varied lengths of time (I know, the worst possible situation) because everything is based on user input. Though it has nothing to do with video, I'd imagine that the timeline of a Non-Linear video editing software would be similar to what I need. I'm pretty sure that the software doesn't load the whole video in at a time, but it is able to skip to any part of the video clip relatively quickly. – joe_coolish Mar 06 '11 at 04:52

3 Answers3

7

A FileStream will allow you to seek to the portion of the file you want, no problem. It's the recommended way to do it in C#, and it's fast.

Sharing between threads: You will need to create a lock to prevent other threads from changing the FileStream position while you're trying to read from it. The simplest way to do this:

//  This really needs to be a member-level variable;
private static readonly object fsLock = new object();

//  Instantiate this in a static constructor or initialize() method
private static FileStream fs = new FileStream("myFile.txt", FileMode.Open);


public string ReadFile(int fileOffset) {

    byte[] buffer = new byte[bufferSize];

    int arrayOffset = 0;

    lock (fsLock) {
        fs.Seek(fileOffset, SeekOrigin.Begin);

        int numBytesRead = fs.Read(bytes, arrayOffset , bufferSize);

        //  Typically used if you're in a loop, reading blocks at a time
        arrayOffset += numBytesRead;
    }

    // Do what you want to the byte array and return it

}

Add try..catch statements and other code as necessary. Everywhere you access this FileStream, put a lock on the member-level variable fsLock... this will keep other methods from reading/manipulating the file pointer while you're trying to read.

Speed-wise, I think you'll find you're limited by disk access speeds, not code.

You'll have to think through all the issues about multi-threaded file access... who intializes/opens the file, who closes it, etc. There's a lot of ground to cover.

James King
  • 6,233
  • 5
  • 42
  • 63
  • 1
    I think he should have multiple `FileStream` instances (one for each thread) rather than using locking to share it among threads. – Gabe Mar 06 '11 at 04:47
  • @Gabe, could you elaborate a little? Wouldn't you get a IOExeption if you try to stream the same file multiple times? – joe_coolish Mar 06 '11 at 04:58
  • 1
    @joe_coolish: As long as your file is open with shared (rather than exclusive) access, there's no reason you can't access it from multiple streams simultaneously. – Gabe Mar 06 '11 at 06:09
  • 1
    Gabe's right, so long as you're not writing to the file. Keep in mind, the more FileStream objects you have open and reading a block of the file, the more memory you'll be using, which you said was an issue. How much more depends on your pattern of use... how many threads, how heavily will you hit this file, etc. Balanced against the bottleneck using a lock will bring while threads wait for their turn at the file pointer. It's the speed vs. memory issue, and you have to decide a strategy that balances your needs. Being a developer is 60% coding and 40% design decisions : ) – James King Mar 06 '11 at 17:11
  • +1 all around. very insightful, exactly what I was looking for! – joe_coolish Mar 06 '11 at 17:19
1

I know nothing about the structure of these files, but reading a portion of a file with FileStream or similar sounds like the best and fastest way to do it.

You will not need to copy the byte[] since FileStream can read directly into a byte array.

It sounds like you might know more about the structure of the file, which could bring up additional techniques as well. But if you need to read only a portion of the file, then this would probably be the way to do it.

Jonathan Wood
  • 65,341
  • 71
  • 269
  • 466
1

If you are using .Net 4 look into using memory mapped files in the System.IO.MemoryMappedFiles namespace.

They are perfect for reading small chunks out of large files. There are samples in the MSDN documentation.

You can also do this in earlier versions of .Net, but then you need to wrap the Win32 API (or use http://winterdom.com/dev/net),

Mikael Svenson
  • 39,181
  • 7
  • 73
  • 79
  • Wouldn't that mean his process would have virtual memory allocated for the size of the entire file? – James King Mar 06 '11 at 17:15
  • True, and I jumped to the conclusion that everyone is using 64bit, where it's not an issue to allocate memory space. Of course in 32bit it is an issue, but you can still use mmf's. Just allocate smaller sections of the file which you move around. – Mikael Svenson Mar 08 '11 at 21:22