0

I've a 12 million rows and 9 cols CSV file.

I'm getting a Keyword error not a MemoryError, not a duplicated question.

I need to read it get 2nd lowest rate for each zipcode.

I've read that to work with big datasets from CSV files, you need to read them in chunks and apply your code to each chunk.

I've this:

import pandas as pd
import csv


for df in pd.read_csv('slcsp/new_df.csv', sep='\t', iterator=True, chunksize=1000):
        df.groupby('zipcode').rate.nsmallest(2).reset_index().drop('level_1',1) \
        .drop_duplicates(subset=['zipcode'], keep='last')

But getting error:

KeyError: 'zipcode' #but there is a column called zipcode

I've checked and there is a column named zipcode.

Traceback (most recent call last):
  File "slcsp/slcsp.py", line 19, in <module>
    df.loc[df.groupby('zipcode').rate.rank(method='first').eq(2),['zipcode','rate']]
  File "D:\virtual_envs\web_scrapping\lib\site-packages\pandas\core\generic.py", line 7632, in groupby
    observed=observed, **kwargs)
  File "D:\virtual_envs\web_scrapping\lib\site-packages\pandas\core\groupby\groupby.py", line 2110, in groupby
    return klass(obj, by, **kwds)
  File "D:\virtual_envs\web_scrapping\lib\site-packages\pandas\core\groupby\groupby.py", line 360, in __init__
    mutated=self.mutated)
  File "D:\virtual_envs\web_scrapping\lib\site-packages\pandas\core\groupby\grouper.py", line 578, in _get_grouper
    raise KeyError(gpr)
KeyError: 'zipcode'
Omar Gonzales
  • 3,806
  • 10
  • 56
  • 120

0 Answers0