Error tokenizing data while uploading CSV file into Pandas Dataframe

Question

I have an 8GB CSV file that contains information about companies created in France. When I try to upload it in Python using pandas.read_csv, I get various types of error; I believe it’s a combination of 3 factors that cause the problem:

The size of the file (8GB)
The French characters in the cells (like “é”)
The fact that this CSV file is organized like an Excel file; the fields are separated by column, just like an XLS file

When I tried to import the file using:

import pandas as pd
df = pd.read_csv(r'C:\..\data.csv')

I got the following error: OSError: Initializing from file failed

Then, to eliminate the problem about the size, I copy the file (data.csv) and paste it, only keeping the first 25 rows (data2.csv). This is a much lighter file, to eliminate the size problem:

df = pd.read_csv(r'C:\..\data2.csv')

I get the same OSError: Initializing from file failed error.

After some research, I try the following code with Data2.csv

df = pd.read_csv(r'C:\..\data2.csv', sep="\t", encoding="latin")

This time, the import successfully works, but in a weird format, like this: https://i.stack.imgur.com/XUBQn.jpg. All fields are in the same column.

So this even with the size problem eliminated, it doesn't properly read the csv file. And still, I need to work with the main file, Data.csv. So I try the same code on the initial file (data.csv):

df = pd.read_csv(r'C:\..\data.csv', sep="\t", encoding="latin")

I get: ParserError: Error tokenizing data. C error: out of memory

What is the proper code to read this data.csv properly?

Thank you,

Can you add the first few lines of your CSV file? – Markus Jan 04 '19 at 15:58 — Markus, Jan 04 '19 at 15:58

score 1 · Answer 1 · answered Jan 04 '19 at 16:02

1

From your image it looks like the file is separated by semi-colons (;). Try using ";" as the sep in the read_csv function.

Pandas reads the csv into ram - an 8GB file could easily exhaust this - try reading the file in chunks. See this answer.

answered Jan 04 '19 at 16:02

N.Clarke

268
1
6

Error tokenizing data while uploading CSV file into Pandas Dataframe

1 Answers1