3

EDIT: Thanks all of you. Python solution worked lightning-fast :)

I have a file that looks like this:

132,658,165,3216,8,798,651

but it's MUCH larger (~ 600 kB). There are no newlines, except one at the end of file.

And now, I have to sum all values that are there. I expect the final result to be quite big, but if I'd sum it in C++, I possess a bignum library, so it shouldn't be a problem.

How should I do that, and in what language / program? C++, Python, Bash?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Martin Janiczek
  • 2,996
  • 3
  • 24
  • 32
  • It's more a question of how / where this fits in the overall program. You don't take a dependency on a language just to do this. Try doing it in the language that the surrounding code is written in, so if you want specific help, name the language. – Mohit Chakraborty Mar 03 '09 at 19:57
  • I did it in C++, but I can't generate it again so I could sum it immediately. Only thing I have is that text-file, so I guess programming language depends on people that'll reply here. I only need it to be precise - no scientific notation... – Martin Janiczek Mar 03 '09 at 20:02

8 Answers8

6

Penguin Sed, "Awk"

sed -e 's/,/\n/g' tmp.txt | awk 'BEGIN {total=0} {total += $1} END {print total}'

Assumptions

  • Your file is tmp.txt (you can edit this obviously)
  • Awk can handle numbers that large
Trampas Kirk
  • 1,436
  • 3
  • 16
  • 21
4

Python

sum(map(int,open('file.dat').readline().split(',')))
user21714
  • 5,781
  • 1
  • 20
  • 26
1

The language doesn't matter, so long as you have a bignum library. A rough pseudo-code solution would be:

str = ""
sum = 0
while input
    get character from input
    if character is not ','
        append character to back of str
    else
        convert str to number
        add number to sum
        str = ""
output sum
Pesto
  • 23,810
  • 2
  • 71
  • 76
1

If all of the numbers are smaller than (2**64)/600000 (which still has 14 digits), an 8 byte datatype like "long long" in C will be enough. The program is pretty straight-forward, use the language of your choice.

schnaader
  • 49,103
  • 10
  • 104
  • 136
0

Since it's expensive to treat that large input as a whole I suggest you take a look at this post. It explains how to write a generator for string splitting. It's in C# but it well suited for crunching through that kind of input.

If you are worried about the total sum to not fit in a integer (say 32-bit) you can just as easily implement a bignum your self, especially if you just use integer and addition. Just carry the bit-31 to next dword and keep adding.

If precision isn't important, just accumulate the result in a double. That should give you plenty of range.

Community
  • 1
  • 1
John Leidegren
  • 59,920
  • 20
  • 131
  • 152
0

http://www.koders.com/csharp/fid881E3E70CC37E480545A0C37C98BC8C208B06723.aspx?s=datatable#L12

A fast C# CSV parser. I've seen it crunch though a few thousand 1MB files rather quickly, I have it running as part of a service that consumes about 6000 files a month.

No need to reinvent a fast wheel.

Eric H
  • 1,759
  • 1
  • 11
  • 14
-1

python can handle the big integers.

daustin777
  • 12,478
  • 8
  • 25
  • 25
-1
tr "," "\n" < file | any old script for summing

Ruby is convenient, since it automatically handles big numbers. I can't remember of Awk does arbitrary precision arithmentic, but if so, you could use

awk 'BEGIN {RS="," ; sum = 0 }
     {sum += $1 }
     END { print sum }' < file
Charlie Martin
  • 110,348
  • 25
  • 193
  • 263
  • this splits the line into fields, adds the first field to 0, ignores the other n-1 fields, spits out the first number and exits – Trampas Kirk Mar 03 '09 at 20:20