0

I have a csv file of size > 10GB and need to display 100 lines on every page with pagination.

I'm using PHP with SED to get 100 lines of the file like below

 sed -n '16224,16482p;16483q' filename > newfile

sed example from here

In PHP,im executing SED commands like below to get range of lines

 $res="sed -n '".$starting.",".$stateEnd."p;".$exitState."q' common.csv > newfile.csv 2>error.log";
 $output_result = shell_exec($res);

But this is taking a long time to get a range of 100 lines from the file.

Is there any better way, fast way to read to get a range of lines in any of these langs like JAVA,PHP OR PYTHON or SHELL SCRIPT or Linux commands.

Can someone please guide me here with an example

Chethu
  • 555
  • 4
  • 13
  • 1
    You realize that, in order to read an arbitrary block of 100 lines, you *have* to read everything before them (in order to find those 100 lines), so there will be a pretty hard limit on how much this can be sped up. – Scott Hunter Jul 15 '19 at 12:44
  • 1
    If you are most likely going to display block after block - it may be better to read it in PHP (using `fopen()` etc.) and remember where you read up to(using `ftell()`), then next time use `fseek()` to jump to start of where you left off. – Nigel Ren Jul 15 '19 at 12:47
  • @ScottHunter yea agree with you, but there will be options in languages that I don't know :( – Chethu Jul 15 '19 at 12:48
  • Assuming it's simple csv data with no fields including embedded newlines... [split](http://man7.org/linux/man-pages/man1/split.1.html) comes in handy. – Shawn Jul 15 '19 at 12:48
  • Every language is under the same constraint of having to read the file to locate the lines. – Scott Hunter Jul 15 '19 at 12:49
  • @Shawn Yea, it's a simple CSV file of large size and no embedded newlines – Chethu Jul 15 '19 at 13:00
  • 4
    Basically the issue is that a CSV file is not the best way to hold 10Gig of data that you actually want to process. quickly/sensibly. Did you consider building a Database table out of the CSV. That would make it far more flexible – RiggsFolly Jul 15 '19 at 13:03
  • @RiggsFolly No don't want to use databases – Chethu Jul 15 '19 at 13:19

1 Answers1

1

If the same file will be used many times relative to how often it changes, you could create an index, identifying where in the file certain lines are, which would allow you to skip ahead & start actual reading closer to the lines you want.

Scott Hunter
  • 48,888
  • 12
  • 60
  • 101
  • not sure how to do like that and file will be updated every day – Chethu Jul 15 '19 at 12:53
  • 6
    If the file will be updated everyday then it should be stored in a database or format that lends itself to that requirement. – WJS Jul 15 '19 at 13:06