2

Say if I have a .csv file that looks like this:

0,0
1,1
2,2
3,3
4,4
5,5,5,5
6,6,6,6
7,7,7,7

How could I create a dataframe from row 5, without it depending on the row number? Obviously I know you can make the header=5 but I would like it to do something more like header=#when it reaches 4 columns#, whatever that row may be.

I realise that this question was not quite as specific as I needed so I have reiterated it here: Creating a dataframe from different rows

2 Answers2

1

You could use str.count in a comprehension. Then wrap that in a data frame constructor.

from pandas.io.common import StringIO as sio
pd.read_csv(sio(
    ''.join(l for l in open('test.csv') if l.count(',') > 2)
), header=None)

   0  1  2  3
0  5  5  5  5
1  6  6  6  6
2  7  7  7  7
piRSquared
  • 285,575
  • 57
  • 475
  • 624
0

My solution would be to first read the csv in as a normal file, filter that file line by line and then use io.StringIO to read the "edited csv" in as dataframe. Caution: this will not be suitable for big files.

For example:

import io
import pandas as pd

new_csv = []
with open('csv.csv') as f:
    for line in f:
        if len(line.split(',')) >= 4:
            new_csv.append(line)
file_io = io.StringIO('\n'.join(new_csv))
df = pd.read_csv(file_io)
Noxeus
  • 567
  • 4
  • 17