Get length of Stata file in Pandas?

Asked Mar 22 '16 at 16:00

Active Mar 22 '16 at 16:00

Viewed 197 times

I am using Pandas to read in a Stata file with the StataReader object, as in:

import pandas as pd
reader = pd.read_stata('file.dta', iterator = True)

Is there a way to get the number of rows in the file before I start using reader.get_chunk to iterate through chunks?

asked Mar 22 '16 at 16:00

Mr. W.

isn't this a dupe of this: http://stackoverflow.com/questions/845058/how-to-get-line-count-cheaply-in-python – EdChum Mar 22 '16 at 16:24
I think that would be true if each line in a Stata file corresponded to an entry, but they don't. – Mr. W. Mar 22 '16 at 16:30
1

I guess the cruddy way would be to have a counter object like `count=0` declared outside a loop and do `for chunk in pd.read_stata('file.dta', iterator=True) count += len(chunk)` – EdChum Mar 22 '16 at 16:35
1

I guess you could just read in a single line and check `memory_usage()`. Then divide the file size by that? Ought to get you very close as I don't think stata dataset have any compression built in. – JohnE Mar 22 '16 at 17:14

0 Answers0