0

I want to compare two really big data sets byte-by-byte in a specific range.

So for example I'd have one 0x80000000 bytes long file and another 0xffffffff bytes long file. And lets say I'd like to compare these two files in a specific range byte-by-byte from 0x1000 to 0x7200000.

If it was a smaller range I'd probably go for until, but now since it's a much bigger range until would be pretty inefficient memory wise.

How would one implement such basic operation in a functional and memory efficient way?

zarko
  • 151
  • 1
  • 3
  • 11
  • 1
    `until` has fixed memory usage, so how is it inefficient for larger ranges? – Tim Nov 22 '19 at 16:20
  • When using `until` I get a GC overhead not enough memory error – zarko Nov 22 '19 at 17:39
  • If I understand you correctly, you seem to be talking about the difficulty of loading all the data in the two files into memory at once. That's not a problem since Scala supports reading from files lazily line-by-line. I'm not sure how to read from the file byte-by-byte though. – Allen Han Nov 22 '19 at 17:40
  • This stackoverflow post may be helpful: https://stackoverflow.com/questions/7598135/how-to-read-a-file-as-a-byte-array-in-scala Now you just need to read from the file lazily. – Allen Han Nov 22 '19 at 17:43
  • 2
    You need to post your code and show how you are reading the files and how you are using `until` to compare them. – Tim Nov 22 '19 at 17:51

0 Answers0