I want to create a system where I load and analyze large amounts of data into pandas. Also, I will later use this to write back to .parquet files
when I try to test this using a simple example, I see that there is some kind of built in limit on the number of rows
import pandas as pd
# Create file with 100 000 000 rows
contents = """
Tommy;19
Karen;20
"""*50000000
open("person.csv","w").write(
f"""
Name;Age
{contents}
"""
)
print("Test generated")
df = pd.read_csv("person.csv",delimiter=";")
len(df)
returns 10 000 000. Not 100 000 000