I have a tricky situation here. I am trying to avoid hitting out of memory exceptions when writing a large CSV dataset to an H5 file via HDFDotNet API. However, I get an out of memory exception when trying to do a second loop through my file data that is the same size as the first iteration, even though the first one works and the second does not and the amount of memory being used should be much less than the ~1.2GB ceiling. I've determined the size of the chunks I want to read in at a time and the size of the chunks I need to write at a time due to limitations with the API. The CSV file is about 105k lines long by 500 columns wide.
private void WriteDataToH5(H5Writer h5WriterUtil)
{
int startRow = 0;
int skipHeaders = csv.HasColumnHeaders ? 1 : 0;
int readIntervals = (-8 * csv.NumColumns) + 55000;
int numTaken = readIntervals;
while (numTaken == readIntervals)
{
int timeStampCol = HasTimestamps ? 1 : 0;
var readLines = File.ReadLines(this.Filepath)
.Skip(startRow + skipHeaders).Take(readIntervals)
.Select(s => s.Split(new char[] { ',').Skip(timeStampCol)
.Select(x => Convert.ToSingle(x)).ToList()).ToList();
//175k is max number of cells that can be written at one time
//(unconfirmed via API, tested and seems to be definitely less than 200k and 175k works)
int writeIntervals = Convert.ToInt32(175000/csv.NumColumns);
for (int i = 0; i < readIntervals; i += writeIntervals)
{
long[] startAt = new long[] { startRow, 0 };
h5WriterUtil.WriteTwoDSingleChunk(readLines.Skip(i).Take(writeIntervals).ToList()
, DatasetsByNamePair[Tuple.Create(groupName, dataset)], startAt);
startRow += writeIntervals;
}
numTaken = readLines.Count;
GC.Collect();
}
}
I end up hitting my out of memory exception on the second pass through of the readlines section
var readLines = File.ReadLines(this.Filepath)
.Skip(rowStartAt).Take(numToTake)
.Select(s => s.Split(new char[] { ',' }).Skip(timeStampCol)
.Select(x => Convert.ToSingle(x)).ToList()).ToList();
In this case, my read intervals var would come out to 50992 and the writeIntervals would come out to about 350. Thanks!