We all know the question, when you are running in a memory error: Maximum size of pandas dataframe
I also try to read 4 large csv-files
with the following command:
files = glob.glob("C:/.../rawdata/*.csv")
dfs = [pd.read_csv(f, sep="\t", encoding='unicode_escape') for f in files]
df = pd.concat(dfs,ignore_index=True)
The only massage I receive is:
C:..\conda\conda\envs\DataLab\lib\site-packages\IPython\core\interactiveshell.py:3214: DtypeWarning: Columns (22,25,56,60,71,74) have mixed types. Specify dtype option on import or set low_memory=False. if (yield from self.run_code(code, result)):
which should be no problem.
My total dataframe has a size of: (6639037, 84)
Could there be any datasize restriction without an memory error? That means python is automatically skipping some lines without telling me? I had this with another porgramm in the past, I don't think python is so lazy, but you never know.
Further reading:
Later i am saving is as sqlite-file
, but I also don't think this should be a problem:
conn = sqlite3.connect('C:/.../In.db')
df.to_sql(name='rawdata', con=conn, if_exists = 'replace', index=False)
conn.commit()
conn.close()