7

In my software I have 4x 500GB files which I write to sequentially in a circular fashion using boosts memory mapped file APIs.

I allocate regions in 32MB blocks, and when allocating a block at the end I create two memory mapped regions where the first is the end of the file and the second is at the start of the file and mapped to the end address of the first region.

Now this works just fine with smaller files. However, with big files when getting to the end region the disk performance goes to the floor and I'm not sure how to avoid it.

What I'm guessing is happening is that the disk tries to write to both ends of the files and the spindle has to jump back and forth. Which is a rather silly thing to do, especially when doing sequential writes, and I would have hoped that the OS would be a bit smarter.

Does anyone have any ideas on how to avoid this issue?

I was thinking of upgrading to Windows 10 and hope it does a better job. But it is a rather risky change that I would like to avoid right now.

I should also note that the files lives on a software RAID 1 with 2x 3TB Seagate Constallation Enterprise drives. These drive have minimum sequential write speed of 60MB/s and avarage of 120MB/s, and I am writing in total with all files at a speed of 30 MB/s.

The code can be found here.

EDIT:

So it turns out, after writing to the entire file and then starting over from the start the OS actually starts reading back what's on the disk even though it is not needed which what I believe is causing the issues.

ronag
  • 49,529
  • 25
  • 126
  • 221
  • how do you measure the disk performance and what is the border size of files after which you experience the problem? – Alexander Balabin Jul 22 '15 at 09:25
  • I am currently testing with smaller and smaller files. Though it takes about a day before it reaches the end. I will update as I get more results. – ronag Jul 22 '15 at 09:26
  • I measure it by the write buffer, I have 4x hot sources that sends data in 4 x 7.5 MB/s and every input packet is buffered. If the buffer starts growing it means that the file is not written fast enough and when it reaches 4GB it starts dropping packets, which is what is currently happening after it reaches the region in question. – ronag Jul 22 '15 at 09:27
  • If this is a sequential write and you're not accessing the data after it has been written, why bother with mapping at all? – Alexander Balabin Jul 23 '15 at 08:25
  • @AlexanderBalabin: Because I'm doing cross process communication where I need to perform atomic writes/reads to sections of the files. – ronag Jul 23 '15 at 08:37
  • How about a set of rolling small files instead of one circular one? You can still map them individually which will involve more pointer math but you'll never have to map existing data back in just to overwrite it. – Alexander Balabin Jul 23 '15 at 08:55
  • @AlexanderBalabin: That might work. Though it is a rather big re-write. – ronag Jul 23 '15 at 09:47
  • I reckon that might be contained behind the api you already have, and really it looks like the only feasible option. – Alexander Balabin Jul 23 '15 at 09:50
  • @AlexanderBalabin: Yea, it can be hidden behind the api, basically I would replace each region with an actual file. – ronag Jul 23 '15 at 11:17

1 Answers1

1

"These drive have minimum sequential write speed of 60MB/s" - which is irrelevant because you're not doing sequential writes.

Use SSD caching, or rethink the design (find a way to prevent access across the buffer wraparound).


Not related to the spee: you could just use a circular buffer directly mapped to the file, so you don't have to use (proprietary?) tricks to map "consecutive" address regions. The rough idea: boost::circular_buffer equivalent for files?

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Well, the way I'm writing in 32MB blocks is pretty much the same as sequential, the seek time on the drives is ~50ms which means there isn't even a 0.01% theoretic overhead. – ronag Jul 22 '15 at 13:40
  • I can't use boost::circular_buffer for several reason, amongst other is that I cannot map the entire file in one go. I've tried that and the machine runs out of memory and crashes. – ronag Jul 22 '15 at 13:40
  • Why not? Is the buffer not fixed in size? Are you in a 16bit address space? – sehe Jul 22 '15 at 13:42
  • SSDs have way to unreliable write performance and I'm not sure how "prevent access across the buffer wraparound" is relevant? Whether it wraps around or not I would still have blocks in the beginning and end cause the head to jump back and forth. – ronag Jul 22 '15 at 13:42
  • I'm in 64 bit address space I'm not sure why it the OS does what it does, but If I map the entire thing I after a little while I get `std::bad_alloc`. Either way, even if that did work, it will not work for what I'm using it for. – ronag Jul 22 '15 at 13:44
  • Well you comprehensively beat me down there. I'm not even going to ask for arguments anymore. I don't know how I can try to help further. Good luck – sehe Jul 22 '15 at 13:46
  • The problem seems to have to do with that once everything has been written to the file the first round it starts reading back the old pages from the file before overwriting with new data. Not sure whether it is possible to create a "write only" view where nothing is read back. – ronag Jul 22 '15 at 22:14