For the sake of completeness as a kind of summary of all what was said about speed and proper opening/closing of files here a solution that works FAST and don't need much fancy code, ... limited to *nix systems(?) (but I think similar technique can be used on other systems too).
The code below runs a tiny bit faster then rawincount()
and counts also last lines which don't have a '\n' at the end of line (a problem rawincount()
has):
import glob, subprocess, pandas
files = glob.glob('files/*.csv')
d = {f: subprocess.getoutput("sed -n '$=' " + f) for f in files}
print(pandas.Series(d))
P.S. Here some timings I have run on a set of large text files (39 files with a total size of 3.7 GByte, Linux Mint 18.1, Python 3.6). Fascinating is here the timing of the proposed wc -l *.csv
method:
Results of TIMING functions for getting number of lines in a file:
-----------------------------------------------------------------
getNoOfLinesInFileUsing_bash_wc : 1.04 !!! doesn't count last non empty line
getNoOfLinesInFileUsing_bash_grep : 1.59
getNoOfLinesInFileUsing_mmapWhileReadline : 2.75
getNoOfLinesInFileUsing_bash_sed : 3.42
getNoOfLinesInFileUsing_bytearrayCountLF_B : 3.90 !!! doesn't count last non empty line
getNoOfLinesInFileUsing_enumerate : 4.37
getNoOfLinesInFileUsing_forLineInFile : 4.49
getNoOfLinesInFileUsing_sum1ForLineInFile : 4.82
getNoOfLinesInFileUsing_bytearrayCountLF_A : 5.30 !!! doesn't count last non empty line
getNoOfLinesInFileUsing_lenListFileObj : 6.02
getNoOfLinesInFileUsing_bash_awk : 8.61