How create a dataframe from a csv file using only certain lines?

Question

I have this csv which contains:

"""START
ABC
XYZ
START OF DATA
"COL1","COL2","COL3"
"1","2","3"
END OF DATA
LMN
NUMBER-OF-RECORDS=1
OPQ"""

I want data frame that looks like:

COL1 COL2 COL3
1     2     3

basically whatever is contained in between START OF DATA and END OF DATA.

I tried doing: df = pd.read_csv(file_name,header=4,skipfooter=4) this works but I dont like the idea of harcoding the values as the header or the footer might change. The only way I see best is to get data between those two keywords and put it into a dataframe. How can I do this?

I don't thing there's a builtin option for this. Try read the csv as pure text file and filter the rows manually? — Quang Hoang, Jan 04 '21 at 21:23
_The only way I see best is to get data between those two keywords and put it into a dataframe. How can I do this?_ Is there a specific issue? Have you tried anything, done any research? Please see [ask], [help/on-topic]. — AMC, Jan 04 '21 at 22:51
@AMC Well I wouldn't be here asking if I had not done any research would I? I have tried using the header and skipfooter properties as you can see above? I tried regex but I am not sure how to combine regex and csv files and get it into a dataframe. I am new to coding, I thought this website helps coders to connect and solve each other problems and not mock there abilities? Please see [How to be not rude](https://www.nbcnews.com/better/pop-culture/why-rudeness-so-toxic-how-stop-it-ncna876131) — Avi G, Jan 05 '21 at 04:01

score 1 · Answer 1 · answered Jan 04 '21 at 21:27

If you don't know the header and footer length beforehand, you'll have to manually parse the file and find the line numbers containing your keywords.

Here's one way to do it, modifying from https://stackoverflow.com/a/3961374/9357244:

import pandas as pd  

filename = 'csv_w_hdr_ftr.csv'

with open(filename) as myFile:
    for num, line in enumerate(myFile, 1): 
        if 'START OF DATA' in line:
            start_line = num 
        elif 'END OF DATA' in line:
            end_line = num

footer_len = num - end_line + 1

df = pd.read_csv(filename,header=start_line,skipfooter=footer_len)

Note that if you have control over how the csv files are written, if you add a comment character to the beginning of the header/footer lines you can use the comment parameter in read_csv to skip those lines (see the read_csv docs).

How create a dataframe from a csv file using only certain lines?

1 Answers1