Pandas, how to stop reading csv after N rows?

Question

I have some csv files I want to read, which for whatever reason is formatted like this

The problem here is that column D is below the other columns, and this makes Pandas very unhappy, once it finishes reading column A, and dives straight into D's column name string.

I can of course read it like

pd.read_csv(file, skiprows=1, nrows = rows_in_A_B_C)

Basically, nrows = length_of_A_B_C. Problem is, I don't know the number of rows before D, and I can't read the csv until I do.

How can I solve this? Can I stop reading rows based on a condition instead, such as when I hit the header for D?

Not an elegant solution, but you can move by hand the extra columns to a new file. Then you have two proper csv files which can be imported and then merged. — Ignacio Vergara Kausel, Oct 25 '17 at 12:30
I'd shoot myself long before I did that to 1000+ files though ;) — komodovaran_, Oct 25 '17 at 12:31
Possible duplicate of [Python Pandas - Read csv file containing multiple tables](https://stackoverflow.com/questions/34184841/python-pandas-read-csv-file-containing-multiple-tables) — voiDnyx, Oct 25 '17 at 12:34

score 0 · Answer 1 · answered Oct 25 '17 at 14:47

A possible answer was already posted in the comments to the original post, but I still felt that they were needlessly hard to think up on the fly, for a rather simple task (or maybe I'm just bad, ha). In my case, I figured that the best method was as follows:

df = pd.read_csv(file, dtype = "str", names = ["A","B","C"])

Now Pandas will fill out all the empty bottom rows with NaN, and this happens to mark all the rows that column D is contained in. All these rows can then be thrown away:

df = df[df["A"].str.contains("NaN") == False]

And because we need it as a numeric dataframe,

df = df.apply(pd.to_numeric)

And now this can be used for skipping rows for parsing D only:

D_only = pd.read_csv(file, skiprows = len(df["A"])

And concatenated with pd.concat(df, D_only), axis=1)

Disclaimer: I don't know how efficient this is, computationally.

Pandas, how to stop reading csv after N rows?

1 Answers1