I have an 8GB CSV file that contains information about companies created in France. When I try to upload it in Python using pandas.read_csv, I get various types of error; I believe it’s a combination of 3 factors that cause the problem:
- The size of the file (8GB)
- The French characters in the cells (like “é”)
- The fact that this CSV file is organized like an Excel file; the fields are separated by column, just like an XLS file
When I tried to import the file using:
import pandas as pd
df = pd.read_csv(r'C:\..\data.csv')
I got the following error: OSError: Initializing from file failed
Then, to eliminate the problem about the size, I copy the file (data.csv) and paste it, only keeping the first 25 rows (data2.csv). This is a much lighter file, to eliminate the size problem:
df = pd.read_csv(r'C:\..\data2.csv')
I get the same OSError: Initializing from file failed error.
After some research, I try the following code with Data2.csv
df = pd.read_csv(r'C:\..\data2.csv', sep="\t", encoding="latin")
This time, the import successfully works, but in a weird format, like this: https://i.stack.imgur.com/XUBQn.jpg. All fields are in the same column.
So this even with the size problem eliminated, it doesn't properly read the csv file. And still, I need to work with the main file, Data.csv. So I try the same code on the initial file (data.csv):
df = pd.read_csv(r'C:\..\data.csv', sep="\t", encoding="latin")
I get: ParserError: Error tokenizing data. C error: out of memory
What is the proper code to read this data.csv properly?
Thank you,