Context
Hi all, i'm trying to split up my dataset into 180 unique pieces and then run it through a geocoder (my n is ~180,000 and the geocoder has a 1,000 batch limit). I'm pretty new to Python but some googling led me to shuffle
within sklearn.utils
. It seems to do the trick and this code here does what I want (conceptually):
from sklearn.utils import shuffle
df = shuffle(addresses)
df1 = df[0:1000]
df2 = df[1000:2000]
df3 = df[2000:3000]
However, I obviously don't want to sit down and manually construct 180 dataframes like this so am looking for a way to put it in a loop. This is my basic idea:
start = 0
end = 1000
for a in range(1,180):
print(start, end, a)
start = start+1000
end = end+1000
The above works fine.
Code that doesn't work
However when I try and integrate the actual splitting into the loop (not just printing) it fails. I'm pretty sure the issue is in how i'm calling the macro a
when i'm naming the dataframes. I have no idea how to solve this though.
from sklearn.utils import shuffle
df = shuffle(addresses)
start = 0
end = 1000
for a in range(1,180):
df_str(a) = df[start:end]
start = start+1000
end = end+1000