I am creating a number of pandas dataframes from a csv file, each in excess of 50k lines. Each line has 45 fields. In the process, I occasionally come across a line with more than 45 fields. To use the data, the only option that I have found is to skip the lines with "error_bad_lines", i.e.
devdata=pd.read_csv(devfile,sep="|",error_bad_lines=False, names=devcolnames,usecols=[0,5,6,8,25])
I am only interested in five fields, the last of which is 25, which is not affected by the difference in length in some of the lines. Is there anything that I can do with a pandas dataframe to read in even the incomplete lines, or must I resort to a list?
Thanks in advance!
Edition after Dan's assistance:
One thing that I found after experimenting with Dan's direction-- if the iterator/chunk method is used, from the post:
large persistent dataframe in pandas
yet you wish to use usecol (in my case because of memory concerns), the columns may be selected in the pd.concat line:
txtdata=pd.concat([chunk[txtcolnames] for chunk in tdata1],ignore_index=True)