1

I am a newbie in pandas datamining. I have GPS dataset that consists of timestamp, longitude and latitude values. My dataset looks like this.

In [3]:
import pandas as pd
import numpy as np
df = pd.read_csv('D:GPS.csv', index_col=None)
df
Out[3]:
    time                mLongitude  mLongitude
0   2014-06-30 00:00:00 94.500000   126.998428
1   2014-06-30 00:00:00 94.500000   126.998428
2   2014-06-30 00:00:00 94.500000   126.998428
3   2014-06-30 00:00:00 94.500000   126.998428
4   2014-06-30 00:00:00 94.500000   126.998428
5   2014-06-30 00:00:00 94.500000   126.998428
6   2014-06-30 00:00:00 94.500000   126.998428
7   2014-06-30 00:00:00 94.500000   126.998428
8   2014-06-30 00:00:00 94.500000   126.998428
9   2014-06-30 00:00:00 94.500000   126.998428
10  2014-06-30 00:00:00 94.500000   126.998428
11  2014-06-30 00:00:00 94.500000   126.998428
12  2014-06-30 00:00:00 94.500000   126.998428
13  2014-06-30 00:00:00 94.500000   126.998428
14  2014-06-30 00:00:00 94.500000   126.998428
15  2014-06-30 00:00:00 94.500000   126.998428
  ...   ... ... ...
9467    2014-08-02 00:00:00 44.299999   126.902259
9468    2014-08-02 00:00:00 44.299999   126.902259
9469    2014-08-02 00:00:00 44.299999   126.902259
9470    2014-08-02 00:00:00 44.299999   126.902259
9471    2014-08-02 00:00:00 44.299999   126.902259
9472    2014-08-02 00:00:00 44.299999   126.902259

In here, I want to calculate traveling distance for each day. And then the example of the output would be like this:

 time        distance (meter)
2014-06-30     1000 
2014-07-01     500
....           ...
2014-08-02     1500
markov zain
  • 11,987
  • 13
  • 35
  • 39
  • How are you calculating distance? – EdChum Apr 14 '15 at 11:07
  • *Ground* or altitude distance ? Straight line/[great circle distance](http://en.wikipedia.org/wiki/Great-circle_distance) ? – Sylvain Leroux Apr 14 '15 at 11:12
  • @SylvainLeroux Actually, I am not expert in this case, but after I have read some references, I think I'm looking for Ground distance. – markov zain Apr 14 '15 at 11:28
  • @EdChum I have read the code from [this](http://gis.stackexchange.com/questions/119846/calculating-distance-between-latitude-and-longitude-points-using-python), but I dont know how to modify the code so that I can implement it in my case. – markov zain Apr 14 '15 at 11:37
  • See related: http://stackoverflow.com/questions/25767596/using-haversine-formula-with-data-stored-in-a-pandas-dataframe/25767765#25767765 – EdChum Apr 14 '15 at 12:03
  • @EdChum, yes thanks for your consideration, i have read that code too, but how I can calculate distance for each day based on my dataset – markov zain Apr 14 '15 at 12:31
  • You'd have to resample your data to daily, at the moment it doesn't look like your lat and lon change at all over the course of the day, at the moment your question is very broad as you are a long way from the final code so really this question should address the distance measuring by itself and you should then check existing questions about resampling – EdChum Apr 14 '15 at 12:48

1 Answers1

3

Thw following is adapted from my answer:

In [133]:

import math
​
df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(np.radians(df['mLatitude']) - math.radians(37.2175900)/2)**2 + math.cos(math.radians(37.2175900)) * np.cos(np.radians(df['mLatitude']) * np.sin(np.radians(df['mLongitude']) - math.radians(-56.7213600)/2)**2)))
df
Out[133]:
            time  mLongitude   mLatitude      distance
index                                                 
0     2014-06-30   94.500000  126.998428  16032.604625
1     2014-06-30   94.500000  126.998428  16032.604625
2     2014-06-30   94.500000  126.998428  16032.604625
3     2014-06-30   94.500000  126.998428  16032.604625
4     2014-06-30   94.500000  126.998428  16032.604625
5     2014-06-30   94.500000  126.998428  16032.604625
6     2014-06-30   94.500000  126.998428  16032.604625
7     2014-06-30   94.500000  126.998428  16032.604625
8     2014-06-30   94.500000  126.998428  16032.604625
9     2014-06-30   94.500000  126.998428  16032.604625
10    2014-06-30   94.500000  126.998428  16032.604625
11    2014-06-30   94.500000  126.998428  16032.604625
12    2014-06-30   94.500000  126.998428  16032.604625
13    2014-06-30   94.500000  126.998428  16032.604625
14    2014-06-30   94.500000  126.998428  16032.604625
15    2014-06-30   94.500000  126.998428  16032.604625
9467  2014-08-02   44.299999  126.902259  10728.740464
9468  2014-08-02   44.299999  126.902259  10728.740464
9469  2014-08-02   44.299999  126.902259  10728.740464
9470  2014-08-02   44.299999  126.902259  10728.740464
9471  2014-08-02   44.299999  126.902259  10728.740464
9472  2014-08-02   44.299999  126.902259  10728.740464
In [137]:

df.set_index('time').resample('D', how='mean')
Out[137]:
            mLongitude   mLatitude      distance
time                                            
2014-06-30   94.500000  126.998428  16032.604625
2014-08-02   44.299999  126.902259  10728.740464

It's unclear if your time is alread a datetime or not but if not you can convert it: df['time'] = pd.to_datetime(df['time']), I also relabelled the columns as you had 2 mLongitude, I'm assuming the 2nd one should be latitude

Community
  • 1
  • 1
EdChum
  • 376,765
  • 198
  • 813
  • 562