0

I have a CSV file that is extremely large, roughly 50k+ lines. I use CHCSVParser to parse it all line by line which works fine.

I would like to display some type of progress to the user as the data is parsed and added into CoreData. If I know the number of lines I can just show numLinesParsed/totalLines. I was wondering if there was a fast way to count the number of lines in the CSV file without :

  1. Loading it all into memory
  2. Taking more than a few seconds

I don't know if I should try do this Objective C or if doing it in straight C would be better.

random
  • 8,568
  • 12
  • 50
  • 85
  • what about reading file in batches? – Anoop Vaidya Apr 01 '13 at 19:16
  • @AnoopVaidya it seems inefficient because if I am reading it in batches I might as well be parsing it. I didn't know if there was some super secret way to just do something like "file.numberOfLines". – random Apr 01 '13 at 19:26
  • 3
    You can always go with (bytes_processed/size_of_file)*100, maybe not exactly what you want, but at least the user gets an idea of how long it's going to take. – Jorge Núñez Apr 01 '13 at 19:29
  • you can seek through the files, check nsfilehandler /nsfilemanager – Anoop Vaidya Apr 01 '13 at 19:30
  • Curious: How long does it take to do `wc -l `? That might give a clue how much time reduction we are looking for. – Arun Apr 01 '13 at 19:51
  • It is on iOS so don't have that luxury :( – random Apr 01 '13 at 22:44
  • 1
    FYI, to the C standard a file is just a sequence of bytes. It doesn't "know" how many of those bytes are linebreaks unless you look at them. So your only hope would be if iOS keeps some kind of additional file metadata, perhaps including the number of linebreaks as part of a summary of the content. I don't know iOS well enough to rule that out, but it seems unlikely. Also note that you probably can read the file in small chunks and scan it for line breaks in less than "a few seconds", but ofc that depends how long the lines are. – Steve Jessop Apr 01 '13 at 23:27

2 Answers2

1

Why don't you get the size of the file in bytes and either divide it by the chars per line (if each line has same amount of characters) or divide it by the number of processed characters to get percentage?

ups: Jorge Nunez said it already...

Community
  • 1
  • 1
Kupto
  • 2,802
  • 2
  • 13
  • 16
  • I will have to go with this if there is no other suggestions as to how to accomplish this. – random Apr 01 '13 at 20:00
  • I am not aware of any way to get the actual amount of specified characters from a file, without reading the entire file. (might by some HW solution, but...) Even 'wc' on linux actually reads the entire file. – Kupto Apr 01 '13 at 20:09
1

I don't think C has a way of obtaining the number of lines without loading the file into memory since that seems like a OS level function, at least to me...

If you are on a UNIX machine, you can use the 'wc' command to obtain the number of lines for any file. You should be able to run that with C via the 'system' command and redirect the output to a temporary file, which then you can access very quickly and parse the line count from there.

If you're using Windows, you can use the findstr command 'findstr /R /N "^" file.txt" to get the number of lines. Note, this will print the lines a colon, and then all the lines in the file. I'm sure you could pare this output down a bit, but I'm not sure how to do this off the top of my head.

Urchin
  • 395
  • 3
  • 9
  • Unfortunately I need to do it on iOS. – random Apr 01 '13 at 22:44
  • Ah, I see. I didn't look up what CHCSVParser was, I assumed it was just some other third-party tool. Yeah, using OS level methods is probably out of the question then. – Urchin Apr 02 '13 at 13:16