0

I'm reading a 50 gb file (read only) using multiple threads with each thread reading a sequential segment from the file. I tried two approaches

  • Using a FileChannel
  • Using a MemoryMappedBuffer obtained from a FileChannel

I was expecting the MemoryMappedBuffer to outperform the FileChannel but the FileChannel performs about 30% better consistently.

I'm looking for an explanation. I'm memory mapping in 1 gb at a time and once I run out I map another 1 gb.

My environment: Windows 7 platform 64 bit xeon 2.7 ghz 2 processors

Erwin Bolwidt
  • 30,799
  • 15
  • 56
  • 79
Sid
  • 7,511
  • 2
  • 28
  • 41
  • Windows 7 platform 64 bit xeon 2.7 ghz 2 processors – Sid Nov 09 '16 at 15:43
  • Just how much RAM do you have? If you don't have well over 50gb, FileChannel should be faster (than using virtual memory). – ebyrob Nov 09 '16 at 19:11
  • What is the purpose of reading the whole 50 gigs file? Can't you process it part by part, so that it can fit your ram? – walkeros Nov 09 '16 at 19:29
  • 32 gb ram but only 20 available – Sid Nov 09 '16 at 19:34
  • @walkeros I'm benchmarking a way to read this file asap – Sid Nov 09 '16 at 19:40
  • It might be interesting to compare `FileInputStream` and/or `FileOutputStream` with the results you're seeing here. I believe "channels" in java.nio have always kind of been synonymous with buffering, whereas the older java.io Stream classes sometimes ran against the "bare metal" of OS I/O. See this thread for details: http://stackoverflow.com/questions/1605332/java-nio-filechannel-versus-fileoutputstream-performance-usefulness – ebyrob Nov 09 '16 at 19:45
  • @Sid: than you have proved yourself that reading the whole file into the momory is poor idea from performance perspective... in fact you should not do this unless needed. You should add to your benchmark a test where you read your file part by part using the two methods you mentioned and this can bring you to totally different consulsion about what is the fastest. – walkeros Nov 09 '16 at 19:46
  • @walkeros I've tried reducing the memory mapped file length to only 16 MB at a time and then I tried 200 MB and in both cases got the same results. – Sid Nov 09 '16 at 19:53
  • Also note that one can only map Integer.MAXVALUE bytes in to memory in Java so I'm not mapping the whole 50 gb at the same time. – Sid Nov 09 '16 at 19:54
  • I see and I assume you are opening the file in READ_ONLY mode? – walkeros Nov 09 '16 at 20:02
  • Yes correct. Although in wondering if it may be because I need to explicitly call the load method of MappedByteBuffer to load the contents into physical memory to prevent too many page faults. I'll try doing that next. – Sid Nov 09 '16 at 20:04
  • If, as it sounds like, you are doing sequential I/O, there is no reason a `MappedByteBuffer` should be faster than a `FileChannel` or even `FileInputStream`, and plenty of reasons why it should be slower. – user207421 Nov 09 '16 at 20:43
  • @EJP I suppose you're paying the cost of page faults upfront in a way by memory mapping the file and forcing a physical memory load as opposed to sequentially buffering via FileChannel? – Sid Nov 09 '16 at 20:52

1 Answers1

2

Both variation have to do the same disk I/O. Both will cache pages in memory as read from disk. Memory mapping has some page-fault overhead. So why do you expect it to be faster, assuming plenty of physical memory to read into?

bmargulies
  • 97,814
  • 39
  • 186
  • 310