I am writing a little script which loops through a .csv, stores each row in the file as a dictionary, and fires off that dictionary to an API in a 1-dimensional list.
import csv
import requests
with open('csv.csv', 'rU') as f:
reader = csv.reader(f, skipinitialspace=True)
header = next(reader)
for row in reader:
request = [dict(zip(header, map(str, row)))]
r = requests.post(url, headers = i_headers, json = request)
print str(reader.line_num) + "-" + str(r)
The request
list looks something like this:
[
{
"id": "1",
"col_1": "A",
"col_2": "B",
"col_3": "C"
}
]
This script works, but I'm looping through an 8 million row .csv, and this method is simply too slow. I would like to speed up this process by sending more than one row per API call. The API I'm working with allows me to send up to 100 rows per call.
How can I change this script to incrementally build lists containing 100 dictionaries, post that to the API and then repeat. A sample of what I'd be sending to this API would look like this:
[
{
"id": "1",
"col_1": "A",
"col_2": "B",
"col_3": "C"
},
{
"id": "2",
"col_1": "A",
"col_2": "B",
"col_3": "C"
},
...
...
...
{
"id": "100",
"col_1": "A",
"col_2": "B",
"col_3": "C"
}
]
One thing that won't work is to build a massive list and then partition it into n lists of size 100. The reason being that my machine cannot hold all of that data in memory at any given time.