0

I am using Pandas to read in a Stata file with the StataReader object, as in:

import pandas as pd
reader = pd.read_stata('file.dta', iterator = True)

Is there a way to get the number of rows in the file before I start using reader.get_chunk to iterate through chunks?

Mr. W.
  • 359
  • 3
  • 12
  • isn't this a dupe of this: http://stackoverflow.com/questions/845058/how-to-get-line-count-cheaply-in-python – EdChum Mar 22 '16 at 16:24
  • I think that would be true if each line in a Stata file corresponded to an entry, but they don't. – Mr. W. Mar 22 '16 at 16:30
  • 1
    I guess the cruddy way would be to have a counter object like `count=0` declared outside a loop and do `for chunk in pd.read_stata('file.dta', iterator=True) count += len(chunk)` – EdChum Mar 22 '16 at 16:35
  • 1
    I guess you could just read in a single line and check `memory_usage()`. Then divide the file size by that? Ought to get you very close as I don't think stata dataset have any compression built in. – JohnE Mar 22 '16 at 17:14

0 Answers0