I am loading in a lot of CSV files into a struct using Golang. The struct is
type csvData struct {
Index []time.Time
Columns map[string][]float64
}
I have a parser that uses:
csv.NewReader(file).ReadAll()
Then I iterate over the rows, and convert the values into their types: time.Time
or float64
.
The problem is that on disk these files consume 5GB space. Once I load them into memory they consume 12GB!
I used ioutil.ReadFile(path)
and found that this was, as expected, almost exactly the on-disk size.
Here is the code for my parser, with errors omitted for readability, if you could help me troubleshoot:
var inMemoryRepo = make([]csvData, 0)
func LoadCSVIntoMemory(path string) {
parsedData := csvData{make([]time.Time, 0), make(map[string][]float64)}
file, _ := os.Open(path)
reader := csv.NewReader(file)
columnNames := reader.Read()
columnData := reader.ReadAll()
for _, row := range columnData {
parsedData.Index = append(parsedData.Index, parseTime(row[0])) //parseTime is a simple wrapper for time.Parse
for i := range row[1:] { //parse non-index numeric columns
parsedData.Columns[columnNames[i]] = append(parsedData.Columns[columnsNames[i]], parseFloat(columnData[i])) //parseFloat is wrapper for strconv.ParseFloat
}
}
inMemoryRepo = append(inMemoryRepo, parsedData)
}
I tried troubleshooting by setting columnData
and reader
to nil at end of function call, but no change.