4

I have a data.csv file like this

Col1,Col2,Col3,Col4,Col5  
10,12,14,15,16  
18,20,22,24,26  
28,30,32,34,36  
38,40,42,44,46  
48,50,52,54,56

Col6,Col7  
11,12  
13,14  
...

Now, I want to read only the data of columns Col1 to Col5 and I don't require Col6 and Col7.

I tried reading this file using

df = pd.read_csv('data.csv',header=0)

then its throwing an error saying

UnicodeDecodeError : 'utf-8' codec cant decode byte 0xb2 in position 3: invalid start byte

Then, I tried this

df = pd.read_csv('data.csv',header=0,error_bad_lines=True)

But this is also not giving the desired result. How can we read only till the first blank line in the csv file?

deadvoid
  • 1,270
  • 10
  • 19
Bhaskar
  • 333
  • 2
  • 12

2 Answers2

5

You can create a generator which reads a file line by line. The result is passed to pandas:

import pandas as pd
import io


def file_reader(filename):
    with open(filename) as f:
        for line in f:
            if line and line != '\n':
                yield line
            else:
                break


data = io.StringIO(''.join(file_reader('data.csv')))
df = pd.read_csv(data)
Eir Nym
  • 1,515
  • 19
  • 30
2

Pandas doesn't have an option to stop at a condition, but it does have condition to stop after n rows. So you could read the file first, count number of rows until blank and then load in pandas with

pd.read_csv('file.csv',nrows= count )

Along the lines of this:

count = 0
with open(filename) as f:
    for line in f:
        if line and line != '\n':
            count += 1
        else:
            break

pd.read_csv(filename,nrows=count)
Christian Sloper
  • 7,440
  • 3
  • 15
  • 28