1

I have the following code

facts = []
with tqdm(total=6022484) as pbar:
for lat in dp.lat:
    for lon in dp.lon:
        for time in dp.time:
            fact = {\
                    'datetime':datetime.datetime.fromtimestamp(float(time.values)/1000000000.),\
                    'loc':[float(lon.values),float(lat.values)],\
                    'temp':celsius(dp.sel(lat=lat.values,lon=lon.values,time=time.values).t2m.values),\
                    'rh':round(dp.sel(lat=lat.values,lon=lon.values,time=time.values).rh.values,1),\
                    'rain':round(dp.sel(lat=lat.values,lon=lon.values,time=time.values).rain.values,1)
                   }
            facts.append(fact)
            pbar.update()

making aprox. 100 iteration per second. Is it possible to do best?

Hugo
  • 1,558
  • 12
  • 35
  • 68

1 Answers1

1

Generally this approach is going to be extremely slow. Instead of iterating through these in python, you should use the standard functions which operate across the values in a vectorized way.

For example, dp.lat = dp.lat.astype('float'), or dp.rain = np.round(dp.rain, 0).

This is a similar discussion: What is the most efficient way to loop through dataframes with pandas?

Community
  • 1
  • 1
Maximilian
  • 7,512
  • 3
  • 50
  • 63
  • I found out that converting to Dataframe is quite fast. The problem now is to convert the dataframe to dict before inserting in Mongo Collection. I have aprox 6 million rows – Hugo Dec 30 '15 at 16:54
  • `DataFrame.to_dict()`? – Maximilian Dec 30 '15 at 22:37
  • But it's also going to be slow converting from a DataFrame to a dict - you haven't materially helped the speed by putting in a DataFrame quickly – Maximilian May 21 '16 at 04:53