I have csv file with 60M plus rows. I am only interested in a subset of these and would like to put them in a dataframe.
Here is the code I am using:
iter_csv = pd.read_csv('/Users/xxxx/Documents/globqa-pgurlbymrkt-Report.csv', iterator=True, chunksize=1000)
df = pd.concat([chunk[chunk['Site Market (evar13)'].str.contains("Canada", na=False)] for chunk in iter_csv])
off the answer here : pandas: filter lines on load in read_csv
I get the following error:
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
Cant seem to figure out whats wrong and will appreciate guidance here.