Is there a way to get millions of records using sodapy in python?

Question

I am trying to read in data from nyc opendata and it has about 25million records.

def JSON_to_DF(json_file):
    json_data = requests.get(json_file)
    text_data = json.loads(json_data.text)
    pd_data = pd.DataFrame(text_data)
    return pd_data
data= JSON_to_DF('https://data.cityofnewyork.us/resource/erm2-nwe9.json')

i used the limit feature but i cant seem to go over 100,000 records.

data= JSON_to_DF('https://data.cityofnewyork.us/resource/erm2-nwe9.json?$limit=100000')

How do i get this done?

i tried using socrata to use the api token but that didnt work either. i guess that bit wasnt important anymore. — John_gis, Jun 21 '20 at 00:29

score 2 · Answer 1 · answered Jun 22 '20 at 14:28

For large data sets, Socrata requires that you paginate through large data requests. Some data sets contain hundreds of thousands, if not millions, of rows. Paginating allows it to return data over many "pages" to let you accumulate the entire download.

For sodpy, .get() retrieves the data as you requested. In that request, it returns 100,000 rows. After that, Socrata likely just times-out and wants you to paginate. You'll want to look at sodpy's get_all() function which is intended to paginate the request..

Is there a way to get millions of records using sodapy in python?

1 Answers1