I am trying to parse a large .txt file with Pandas. The file is 1.6 GB in size. You can download the file here (it is a GeoNames database dump of all countries and settlements).
In regard to loading and parsing the file in Pandas, I consulted the answers here and here and this is what I have in code:
import pandas as pd
for chunk in pd.read_csv(
"allCountries.txt",
header=None,
engine="python",
sep=r"\s{1,}",
names=[
"geonameid",
"name",
"asciiname",
"alternatenames",
"latitude",
"longitude",
"feature class",
"feature code",
"country code",
"cc2",
"admin1 code",
"admin2 code",
"admin3 code",
"admin4 code",
"population",
"elevation",
"dem",
"timezone",
"modification date",
],
chunksize=1000,
):
print(chunk[0]) # just printing out the first row
If I run the code above, I get the following error:
ParserError: Expected 20 fields in line 1, saw 25. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
I don't know what is going wrong here. Can someone tell me what is going wrong and how do I fix it?