I want to do some analysis an a pretty dang big file:
$ ls -lSh jq1pileup
-rw-rw-r--+ 1 balter SomeGroup 80G Nov 15 12:23 jq1pileup
$ wc jq1pileup
3099750719 30997507190 85744405658 jq1pileup
But, fortunately I'm on a cluster with some pretty beefy machines
$ free -mhtal
total used free shared buffers cached available
Mem: 94G 71G 22G 1.4G 592M 50G 0B
Low: 94G 71G 22G
High: 0B 0B 0B
-/+ buffers/cache: 20G 73G
Swap: 195G 6.1G 188G
Total: 289G 77G 211G
I'm finding that reading in my file is taking an extremely long time (like measured in hours). What is reasonable to expect? Doing something simple like getting the shape or, horrors, a histogram again takes hours.
Is this what I should expect for a task such as this?
EDIT:
The file is a TSV file. (FWIW, an pileup of genomic abundance). Oh, and it is not apparent from WC, but it has 9 columns.