1

Go is pretty new to me and i have some troubles understanding the memory usage :

I want to load a file similar to csv into an array of rows, each row being a struct composed of a key on 22 char and an array of values (string).
My code look like this : https://play.golang.org/p/hJ4SHjVXaG

Problem is that for a file of 450M it uses around 2G1 of memory.
Does anyone have a solution to reduce that memory use ?

Update using SirDarius solution : https://play.golang.org/p/DBmOFOkZdx still use around 1G9

rWick
  • 37
  • 6

4 Answers4

6

How many lines and fields are there in the file?

It is plausible that what you are describing is using the minimum amount of memory.

Looking at the code I think it will use 450MB of memory for the underlying string data.

It will then slice that up into strings. These consist of a pointer and a length which take 16 bytes on a 64 bit platform.

So 1.5GB/16 = 93Million.

So if there are >50 Million fields in your file then the memory use seems reasonable.

There are other overheads like number of rows etc so this isn't an exact calculation.

EDIT

Given
5 millions row, 10 column each

That is 50 million string headers of 16 bytes which will take 800MB. Plus the data itself 450MB, plus 5 * 8 * 5 million Rows = 200MB makes 1.45GB

So I don't think even with perfect memory allocation, you'll be able to reduce the usage below 1.5GB.

Nick Craig-Wood
  • 52,955
  • 12
  • 126
  • 132
2

This seems pretty inefficient to me:

for _, value := range strings.Split(line[23:], ";") {
    row.Values = append(row.Values, value)
}

You basically obtain a []string by calling the string.Split function, and then loop over that slice to append every string to another initially nil string slice.

Why not just do:

row.Values = strings.Split(line[23:], ";")

instead ?

Though I can't guarantee it, it might be possible that the loop causes each string to be copied, and therefore make your program use twice as memory as needed.

SirDarius
  • 41,440
  • 8
  • 86
  • 100
  • Indeed, it's pretty useless, i had some validity check in that loop but i can postpone them. I just tried, it get down to 1g9, not perfect but already better ! Thanks ! – rWick Apr 26 '16 at 10:20
1

You are appending into a Row struct the values obtained by each iteration, which considering the huge file size is not a reasonable good approach. Why your are not processing the file in batches?

Looking at the Split function it returns a slice of substrings, so it's not necessary to range over the resulted slices and append them into the row.Values. You can assign the resulted values directly to row.Values, then append it to the rows slice.

func Split(s, sep string) []string

Split slices s into all substrings separated by sep and returns a slice of the substrings between those separators. If sep is empty, Split splits after each UTF-8 sequence. It is equivalent to SplitN with a count of -1.

row.Values = strings.Split(line[23:], ";")
rows = append(rows, row)
Endre Simo
  • 11,330
  • 2
  • 40
  • 49
0

Seems to me it's about append() function. From language spec

If the capacity of s is not large enough to fit the additional values, append allocates a new, sufficiently large underlying array

Size of this newly allocated array can be sufficient enough to consume even more further appends. So to allocate precisely you should slice := make([]Row, 0, WithExpectedCapacity) and than assign slice[n]= instead of append(). If you can't do this, you at least can try reflection to compact

reflect.ValueOf(&slice).Elem().SetCap(len(slice))

Some tricky, but you can see https://play.golang.org/p/LslkOBCvII it works.

Uvelichitel
  • 8,220
  • 1
  • 19
  • 36
  • I know the size of the file I use for my test,so I tried to set manually the capacity of each array, but with no big result ... So I guess it won't change much since i wasn't using the append but i'll try ! – rWick Apr 26 '16 at 14:31