Skip specific rows with an "NaN" value while reading a csv file in python

Question

I have csv which i read in a query from a windows folder.

files = glob.glob(r"LBT210*.csv")
dfs = [pd.read_csv(f, sep=";", engine='c') for f in files]
df2 = pd.concat(dfs,ignore_index=True)

However the output looks like:

columnA columnB columnC
1         1        0
2         0        A
NaN       NaN      1
3         B        D
...

How can I skip reading the rows, which contain a 'NaN' (none-value) in the columnB, so that i can save some memory and speed processing it? So I don't want to read the rows! I want to adjust:

dfs = [pd.read_csv(f, sep=";", engine='c') for f in files] somehow

is it OK to drop those after you have read them, or do you want to have never read them from the csv file? — lane, Jan 27 '22 at 14:09
Does this answer your question? [How can I filter lines on load in Pandas read\_csv function?](https://stackoverflow.com/questions/13651117/how-can-i-filter-lines-on-load-in-pandas-read-csv-function) — radrow, Jan 27 '22 at 14:19
That question is from 10 years ago, so it is worth looking into — lane, Jan 27 '22 at 14:26

lane · Answer 1 · 2022-01-27T14:30:42.363

According to the selected answer from this question here there isn't a way to filter before the file is read into memory. Since this was from over 10 years ago, I also rechecked the read_csv options and it doesn't look like anything else may help.

After being inspired from the other stackoverflow question and selected answer, you can do something like this to reduce memory consumption.

iter_csv = pd.read_csv(f, sep=";", enine='c', iterator=True, chunksize=1000)
df = pd.concat([chunk[~chunk['columnB'].isna()] for chunk in iter_csv])

Skip specific rows with an "NaN" value while reading a csv file in python

1 Answers1