Read CSV file in Pandas with Blank lines in between

Question

I have a data.csv file like this

Col1,Col2,Col3,Col4,Col5  
10,12,14,15,16  
18,20,22,24,26  
28,30,32,34,36  
38,40,42,44,46  
48,50,52,54,56

Col6,Col7  
11,12  
13,14  
...

Now, I want to read only the data of columns Col1 to Col5 and I don't require Col6 and Col7.

I tried reading this file using

df = pd.read_csv('data.csv',header=0)

then its throwing an error saying

UnicodeDecodeError : 'utf-8' codec cant decode byte 0xb2 in position 3: invalid start byte

Then, I tried this

df = pd.read_csv('data.csv',header=0,error_bad_lines=True)

But this is also not giving the desired result. How can we read only till the first blank line in the csv file?

In my opinion, the problem is with the file. This is not a valid csv file, but two csv files concatenated into one. Try splitting the file into two files. — rje, Oct 18 '18 at 21:41

score 5 · Accepted Answer · answered Oct 18 '18 at 21:39

You can create a generator which reads a file line by line. The result is passed to pandas:

import pandas as pd
import io


def file_reader(filename):
    with open(filename) as f:
        for line in f:
            if line and line != '\n':
                yield line
            else:
                break


data = io.StringIO(''.join(file_reader('data.csv')))
df = pd.read_csv(data)

score 2 · Answer 2 · answered Oct 18 '18 at 21:40

2

Pandas doesn't have an option to stop at a condition, but it does have condition to stop after n rows. So you could read the file first, count number of rows until blank and then load in pandas with

pd.read_csv('file.csv',nrows= count )

Along the lines of this:

count = 0
with open(filename) as f:
    for line in f:
        if line and line != '\n':
            count += 1
        else:
            break

pd.read_csv(filename,nrows=count)

answered Oct 18 '18 at 21:40

Christian Sloper

7,440
3
15
28

There are many files to read @Christian Sloper. So it would be extremely difficult to count the rows in each file – Bhaskar Oct 19 '18 at 08:28
Bit hard to understand that comment, you count with the program snippet , just before you load it into pandas. – Christian Sloper Oct 19 '18 at 08:37

Read CSV file in Pandas with Blank lines in between

2 Answers2

Linked