Background:
I have a pandas Dataframe with some ~200k+ rows of data.
<class 'pandas.core.frame.DataFrame'>
Int64Index: 212812 entries, 0 to 212811
Data columns (total 10 columns):
date 212812 non-null values
animal_id 212812 non-null values
lons 212812 non-null values
lats 212812 non-null values
depth 212812 non-null values
prey1 212812 non-null values
prey2 212812 non-null values
prey3 212812 non-null values
dist 212812 non-null values
sog 212812 non-null values
dtypes: float64(9), int64(1), object(1)
For each date, there are 1000 individuals with lon/lat positions.
I would like to calculate the daily change in distance for each individual, which I had successfully done for 100 individuals using pyproj.Geod.inv , but the increase in population has slowed things down massively.
Question:
Is there an efficient way of performing calculations on a pandas dataframe using an external class method like pyproj.Geod.inv
?
Example routine:
ids = np.unique(data['animal_id'])
for animal in ids:
id_idx = data['animal_id']==animal
dates = data['date'][id_idx]
for i in range(len(dates)-1):
idx1 = (data['animal_id']==id) & (data['date']==dates[i])
idx2 = (data['animal_id']==id) & (data['date']==dates[i+1])
lon1 = data['lons'][idx1]
lat1 = data['lats'][idx1]
lon2 = data['lons'][idx2]
lat2 = data['lats'][idx2]
fwd_az, bck_az, dist = g.inv(lon1,lat1,lon2,lat2)
data['dist'][idx2] = dist
data['sog'][idx2] = dist/24. #dist/time(hours)