0

I have a buffer that looks like that:

some_random_info
another info here that i dont want to parser
column1,column2,column3
a,b,c   

I want to read this that using python csv built-in module using the DictReader class, but from the docs it says:

 class csv.DictReader(f, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)

    Create an object that operates like a regular reader but maps the information in each row to a dict whose keys are given by the optional fieldnames parameter.

    The fieldnames parameter is a sequence. If fieldnames is omitted, the values in the first row of file f will be used as the fieldnames. Regardless of how the fieldnames are determined, the dictionary preserves their original ordering.    

I have tried that:

import io
import csv

buffer = io.StringIO("""
some_random_info
another info here that i dont want to parser
column1,column2,column3
a,b,c   
""")


reader = csv.DictReader(buffer,fieldnames=['column1','column2','column3'])
for row in reader:
    print(row)

But outputs this:

{'column1': 'some_random_info', 'column2': None, 'column3': None}
{'column1': 'another info here that i dont want to parser', 'column2': None, 'column3': None}
{'column1': 'column1', 'column2': 'column2', 'column3': 'column3'}
{'column1': 'a', 'column2': 'b', 'column3': 'c   '}

what I'm looking for is just {'column1': 'a', 'column2': 'b', 'column3': 'c '}

moth
  • 1,833
  • 12
  • 29

1 Answers1

3

One option would be to call next(buffer) as many times as needed to skip the extra lines at the start.

Since the first line after that is the header, don't skip it, and don't specify fieldnames, just let DictReader parse that line to get the field names automatically.

import io
import csv

buffer = io.StringIO("""
some_random_info
another info here that I don't want to parse
column1,column2,column3
a,b,c   
""")

for _ in range(3):  # read the first 3 lines
    print("skip:", next(buffer), end="")
# skip:
# skip: some_random_info
# skip: another info here that I don't want to parse

reader = csv.DictReader(buffer)  # read the 4th line to get column names
for row in reader:
    print(row)  # read all remaining lines to get column values
# {'column1': 'a', 'column2': 'b', 'column3': 'c   '}

Obviously in the above you don't need to print() the next(buffer) calls, but for the purposes of the example it helps you see what exactly you're "skipping" with each iteration.

Samwise
  • 68,105
  • 3
  • 30
  • 44
  • hum why `next(buffer) * 4` doesn't work ? – moth Mar 12 '23 at 15:29
  • The `*` operator isn't the same as a `for` loop. If you do that, you'll get the first line by calling `next` once, and then you'll multiply that string by 4. (Also, as explained in the answer, you don't want to skip 4 lines, just 3.) – Samwise Mar 12 '23 at 15:33