2

I am trying to read in data from nyc opendata and it has about 25million records.

def JSON_to_DF(json_file):
    json_data = requests.get(json_file)
    text_data = json.loads(json_data.text)
    pd_data = pd.DataFrame(text_data)
    return pd_data
data= JSON_to_DF('https://data.cityofnewyork.us/resource/erm2-nwe9.json')

i used the limit feature but i cant seem to go over 100,000 records.

data= JSON_to_DF('https://data.cityofnewyork.us/resource/erm2-nwe9.json?$limit=100000')

How do i get this done?

Tom Schenk Jr
  • 478
  • 2
  • 8
John_gis
  • 117
  • 8

1 Answers1

2

For large data sets, Socrata requires that you paginate through large data requests. Some data sets contain hundreds of thousands, if not millions, of rows. Paginating allows it to return data over many "pages" to let you accumulate the entire download.

For sodpy, .get() retrieves the data as you requested. In that request, it returns 100,000 rows. After that, Socrata likely just times-out and wants you to paginate. You'll want to look at sodpy's get_all() function which is intended to paginate the request..

Tom Schenk Jr
  • 478
  • 2
  • 8