-1

Here is the C# code:

string file = File.ReadAllText(@"C:\repos\BigTextSearch\Codes.txt");

I am trying to load a 10GB file in memory in .Net core C#. I have 32GB RAM in my PC out of which 20GB is free. Codes.txt contains around billion codes. I don't want to chunk based on my problem.

When I run the console application, I get the error "out of memory". Is there a way to increase the heap size? Apparently, based on some answers, the memory allocation allowed for the process is around 3GB. I would like to know if it is possible to increase it so I can load my 10GB file in memory.

trincot
  • 317,000
  • 35
  • 244
  • 286
Mikhail
  • 1
  • 1
  • Does this answer your question? [C# Increase Heap Size - Is It Possible](https://stackoverflow.com/questions/2325370/c-sharp-increase-heap-size-is-it-possible) – JD Davis Sep 03 '20 at 20:43
  • Do you compile a 32bit or 64bit application ? – Olivier Samson Sep 03 '20 at 20:47
  • i am compiling 64bit – Mikhail Sep 03 '20 at 20:50
  • Have you tried with a smaller file? Like 2GB? – Michael Sep 03 '20 at 20:52
  • 2
    What you want to do with that file? This is critical to give you some workaround. For example, if you need to process this data (search, modify, rewrite) perhaps loading the file in a database system line by line could offer some advantage for further processing – Steve Sep 03 '20 at 21:06
  • Tried to load 2.25 GB file, same problem. "Out of memory" Tried to load 1.65 GB file, same problem. "Out of memory" Tried to load 900 MB file, works fine. – Mikhail Sep 03 '20 at 21:07
  • Steve: Thanks. Yes, as a work around, I understand we could read the file line by line and save it in a database for further processing. But I would like to do it in memory for now to avoid any type of IO. My process should be allowed to use the free memory available in the OS. If there is a restriction, we should be able to lift it. – Mikhail Sep 03 '20 at 21:20
  • 1
    Are you sure it's 64 bit? Did you check the EXE? The *"Prefer 32-bit"* checkbox isn't set in your project properties? What are you doing with that 10 gig string? Would it not make more sense to stream it in and do whatever it is you do with it in chunks? – Flydog57 Sep 03 '20 at 22:00
  • You are skating on the edge cases of the *Large Object Heap* https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/large-object-heap. You also will want to familiarize yourself with Perfmon and the .NET memory counters. It may take a while to figure out what to look for (sorry, it's been nearly 10 years since I did that kind of thing), but you should be able to figure out what's going on that way – Flydog57 Sep 03 '20 at 22:12
  • _Tried to load 900 MB file, works fine_ limit of a single object size is 2GB. `string` has UTF-16 encoding, so at least 2 bytes per character. Thus you can load less than 1GB of UTF-8 into single `string`. You must split the data e.g. into `string[] lines = File.ReadAllLines(...)` or use `List`, etc. – aepot Sep 03 '20 at 22:49
  • Check out some of the posts on this topic. [Here's](https://stackoverflow.com/a/44944950/9278478) one which uses stream reader to read large files – Ceemah Four Sep 04 '20 at 00:24
  • Does this answer your question? [What is the maximum possible length of a .NET string?](https://stackoverflow.com/questions/140468/what-is-the-maximum-possible-length-of-a-net-string) – GSerg Jul 03 '21 at 21:52

2 Answers2

-1

So it seems the C# uses 2 bytes per character https://social.msdn.microsoft.com/Forums/vstudio/en-US/053aa028-774c-4a81-9586-16cb0e469177/how-to-know-the-byte-size-of-a-string?forum=csharpgeneral which would explain why 20GB free isn't enough for a 10GB file. I think this has to do because it's being read in as Unicode.

Maybe .NET defaults to something like UTF-16 <- not sure about this part.

edit

Yeah so it is UTF-16 https://learn.microsoft.com/en-us/dotnet/api/system.char?view=netcore-3.1

g23
  • 666
  • 3
  • 9
  • thanks. I understand, you have a point. But, as per Michael's recommendation, I tried to load 2.25 GB GB file and got the same out of memory problem. The I tried 1.6 GB file and encountered the same problem. I was able to load 900 MB file successfully though. Actually I have a little over 21.8 GB free memory to be exact. Seems like there is some restriction in place by the OS that I need help with. – Mikhail Sep 03 '20 at 21:29
  • oops just saw that comment, hmm... How much memory is your process using when you load in the 900MB file? Maybe there's something weird happening where it's blowing up in memory usage, else yeah it's probably some weird OS / .NET setting and I'm not sure about that. You could try streaming the file but I think another comment mentioned that – g23 Sep 03 '20 at 21:39
  • Moreover, to your point, when I load a smaller file like 600MB and use the following code to load count the bytes, the number of bytes printed on the console is matching with the size of the file on disk: string file = File.ReadAllText(@"C:\repos\BigTextSearch\words.txt"); Console.WriteLine(ASCIIEncoding.ASCII.GetByteCount(file)); – Mikhail Sep 03 '20 at 21:40
  • what does task manager / `htop` say your memory usage is when you load in one of those files, it should be about double the size of the file. But I think you're right it's probably some setting limiting it – g23 Sep 03 '20 at 21:42
  • or may be, even though I have 21 gig free memory, there is not enough contiguous memory causing the problem, just thinking out loud. string need contiguous memory. – Mikhail Sep 03 '20 at 22:10
  • Don't use task manager, it's too coarse. Use Perfmom – Flydog57 Sep 04 '20 at 00:26
-1

Scenario 1: On disk, char is stored as ASCII, 1 byte per character in a txt file. In memory, C# char is stored as unicode, 2 bytes per character. With that said, if you are loading a Y MB text file from disk using C#, it will take more than 2*Y MB memory or more than double. So make sure you have enough memory at your disposal. (But this was not my case)

Scenario 2: Moreover, you might have enough memory but not enough contiguous memory. For example you might have 20 GB free memory, but only 1 GB might be available as a single block. Memory is fragmented. In that case, if you try to create a string or character array greater than 1 GB, you'll get "out of memory". (This was my case)

Solution:

  1. If you really want to work in memory, load the file in chunks or line by line and store the chunks in a data structure like linked list, to avoid allocating contiguous blocks. Linked list or similar data structure will allocate distributed but linked memory. Data structures like String, List, Dictionary, HashSet allocate fully and/or partial contiguous blocks so avoid them.
  2. Depends on the problem, but if your problem allows, stream the file into a database for further processing, searching, updating, deleting etc. You'll have to deal with some IO latency though unless you use a fully in-memory DB.
Mikhail
  • 1
  • 1
  • Contiguous memory was not even the problem. [Maximum string length](https://stackoverflow.com/a/140749/11683) was. – GSerg Jul 03 '21 at 21:54