File has two parts - 1st is text 2nd is CSV. How to parse only the CSV part with python

Question

I have a text file which contains text in the first 20 or so lines, followed by CSV data. Some of the text in the text section contains commas and so trying csv.reader or csv.dictreader doesn't work well.

I want to skip past the text section and only then start to parse the CSV data.

Searches don't yield much other than instructions to either use csv.reader/csv.dictreader and iterate through the rows that are returned (which doesn't work because of the commas in the text), or to read the file line-by-line and split the lines using ',' as the delimiter.

The latter works up to a point, but it produces strings, not numbers. I could convert the strings to numbers but I'm hoping that there's a simple way to do this either with the csv or numpy libraries.

As requested - Sample data:

This is the first line. This is all just text to be skipped.
The first line doesn't always have a comma - maybe it's in the third line
Still no commas, or was there?
Yes, there was. And there it is again.
and so on
There are more lines but they finally stop when you get to 
EndOfHeader
1,2,3,4,5
8,9,10,11,12
3, 6, 9, 12, 15

Thanks for the help.

Edit#2 A suggested answer gave the following link entitled Read file from line 2... That's kind of what I'm looking for, but I want to be able to read through the lines until I find the "EndOfHeader" and then call on the CSV library to handle the remainder of the file. The reply by saimadhu.polamuri is part of what I've tried, specifically

with open(filename , 'r') as f:
    first_line = f.readline()
    for line in f:
            #test if line equals EndOfHeader. If true then parse as CSV

But that's where it comes apart - I can't see how to have CSV work with the data from this point forward.

Please provide sample data or the original. You will need to find a definitive separation between text and the csv data this could be as simple as an empty line or a period, or numbers, or testing each line to see if it is csv readable and then checking if it really is what you want. — Jab, May 31 '19 at 22:41
Since "text" is a classification that include "csv" as a subset, we'll need a distinct problem definition before we can help. — Prune, May 31 '19 at 22:57
@blorgbeard - I can identify a particular phrase, for example "EndOfHeader" — scouser, Jun 01 '19 at 19:21
5 minute timeout FTF. To blorgbeard and jab - I can identify a particular phrase, for example "EndOfHeader" that will have an entire line to itself. After that, the next line is the start of the CSV data. To Prune - Your replies are unhelpful: You don't know the first thing about my code and yet you'll suggest dishonesty? Probably wouldn't occur to you that maybe I don't have access to my code just now (it's in work) and I'm posting like this in the hope that maybe someone understands what I'm asking or has come across the same thing before and not required any code in order to reply. — scouser, Jun 01 '19 at 19:38
`csv.reader` accepts a filehandle. Therefore, you can [read until you find the end-of-data marker](https://stackoverflow.com/questions/4796764/read-file-from-line-2-or-skip-header-row) and then pass the handle. — Mike, Jun 01 '19 at 20:06
Possible duplicate of [Read file from line 2 or skip header row](https://stackoverflow.com/questions/4796764/read-file-from-line-2-or-skip-header-row) — Mike, Jun 01 '19 at 20:06
Thanks @Mike - I didn't realise that the filehandle could be passed after reading an arbitrary number of lines. I've tested this on my sample data and it does exactly what I want. Cheers! — scouser, Jun 01 '19 at 22:16

score 0 · Answer 1 · answered Jun 01 '19 at 23:49

With thanks to @Mike for the suggestion, the code is actually reasonably straightforward.

with open('data.csv') as f:                # open the file
    for i in range(7):                     # Loop over first 7 lines 
        str=f.readline()                   # just read them. Could also do f.next()
    r = csv.reader(f, delimiter=',')       # Now pass the file handle to a csv reader
    for row in r:                          # and loop over the resulting rows
        print(row)                         # Print the row. Or do something else.

In my actual code, it will search for the EndOfHeader line and use that to decide where to start parsing the CSV

I'm posting this as an answer, as the question that this one supposedly duplicates doesn't explicitly consider this issue of the file handle and how it can be passed to a CSV reader, and so it may help someone else.

Thanks to all who took time to help.

File has two parts - 1st is text 2nd is CSV. How to parse only the CSV part with python

1 Answers1