4

I have a csv containing locations (latitude,longitude) for a given user denoted by the id field, at a given time (timestamp). I need to calculate the distance and the velocity between a point and the successive point for each user. For example, for ID 1 I need to find the distance and velocity between point 1 and point 2, point 2 and point 3, point 3 and point 4, and so on. Given I am working with coordinates on the Earth, I understand the Haversine metric will be used for distance calculations, however, I am unsure how to iterate though my file given the time and user order aspect to my problem. Given this, with python, how can I iterate through my file to sort the events by user and by time, and then calculate the distance and velocity between each?

Ideally, the output would be a second csv looking something like: ID#, start_time, start_location, end_time, end_location, distance, velocity.

Sample data below:

ID,timestamp,latitude,longitude
3,6/9/2017 22:20,38.7953326,77.0088833  
1,5/5/2017 13:10,38.8890106,77.0500613
2,2/10/2017 16:23,40.7482494,73.9841913
1,5/5/2017 12:35,38.9206015,77.2223287
3,6/10/2017 10:00,42.3662109,71.0209426
1,5/5/2017 20:00,38.8974155,77.0368333
2,2/10/2017 7:30,38.8514261,77.0422981
3,6/9/2017 10:20,38.9173461,77.2225527
2,2/10/2017 19:51,40.7828687,73.9675438
3,6/10/2017 6:42,38.9542676,77.4496951
1,5/5/2017 16:35,38.8728748,77.0077629
2,2/10/2017 10:00,40.7769311,73.8761546
andrewr
  • 784
  • 13
  • 31

1 Answers1

14

Seems like you could use the magic of pandas.

Read the data

It's easy to create a pandas dataframe from a csv file using the read_csv() function:

import pandas as pd
df = pd.read_csv(filename)

Based on your sample data, this will create the following dataframe:

    ID        timestamp   latitude  longitude
0    3   6/9/2017 22:20  38.795333  77.008883
1    1   5/5/2017 13:10  38.889011  77.050061
2    2  2/10/2017 16:23  40.748249  73.984191
3    1   5/5/2017 12:35  38.920602  77.222329
4    3  6/10/2017 10:00  42.366211  71.020943
5    1   5/5/2017 20:00  38.897416  77.036833
6    2   2/10/2017 7:30  38.851426  77.042298
7    3   6/9/2017 10:20  38.917346  77.222553
8    2  2/10/2017 19:51  40.782869  73.967544
9    3   6/10/2017 6:42  38.954268  77.449695
10   1   5/5/2017 16:35  38.872875  77.007763
11   2  2/10/2017 10:00  40.776931  73.876155

Convert the timestamp column

Pandas (and python in general) has extensive libraries for date and time operations. But first, you will need to prepare your data by converting the timestamp column (a string) into a datetime object. I am assuming your data is in the format "MM/DD/YYYY" (since you didn't specify).

df['timestamp'] = pd.to_datetime(df['timestamp'], format='%m/%d/%Y %H:%M')

Helper functions

You're going to have to define some functions to compute the distance and the velocity. The Haversine distance function is adapted from this answer.

from math import sin, cos, sqrt, atan2, radians

def getDistanceFromLatLonInKm(lat1,lon1,lat2,lon2):
    R = 6371 # Radius of the earth in km
    dLat = radians(lat2-lat1)
    dLon = radians(lon2-lon1)
    rLat1 = radians(lat1)
    rLat2 = radians(lat2)
    a = sin(dLat/2) * sin(dLat/2) + cos(rLat1) * cos(rLat2) * sin(dLon/2) * sin(dLon/2) 
    c = 2 * atan2(sqrt(a), sqrt(1-a))
    d = R * c # Distance in km
    return d

def calc_velocity(dist_km, time_start, time_end):
    """Return 0 if time_start == time_end, avoid dividing by 0"""
    return dist_km / (time_end - time_start).seconds if time_end > time_start else 0

Make some intermediate variables

We want to compute the Haversine function on each row, but we need some information from the first row for each group. Luckily, pandas makes this easy with sort_values(), groupby() and transform().

The following code makes 3 new columns, one each for the initial latitude, longitude, and time for each ID.

# First sort by ID and timestamp:
df = df.sort_values(by=['ID', 'timestamp'])

# Group the sorted dataframe by ID, and grab the initial value for lat, lon, and time.
df['lat0'] = df.groupby('ID')['latitude'].transform(lambda x: x.iat[0])
df['lon0'] = df.groupby('ID')['longitude'].transform(lambda x: x.iat[0])
df['t0'] = df.groupby('ID')['timestamp'].transform(lambda x: x.iat[0])

Apply the functions

# create a new column for distance
df['dist_km'] = df.apply(
    lambda row: getDistanceFromLatLonInKm(
        lat1=row['latitude'],
        lon1=row['longitude'],
        lat2=row['lat0'],
        lon2=row['lon0']
    ),
    axis=1
)

# create a new column for velocity
df['velocity_kmps'] = df.apply(
    lambda row: calc_velocity(
        dist_km=row['dist_km'],
        time_start=row['t0'],
        time_end=row['timestamp']
    ),
    axis=1
)

The Result

>>> print(df[['ID', 'timestamp', 'latitude', 'longitude', 'dist_km', 'velocity_kmps']])

    ID           timestamp   latitude  longitude     dist_km  velocity_kmps
3    1 2017-05-05 12:35:00  38.920602  77.222329    0.000000       0.000000
1    1 2017-05-05 13:10:00  38.889011  77.050061   15.314742       0.007293
10   1 2017-05-05 16:35:00  38.872875  77.007763   19.312148       0.001341
5    1 2017-05-05 20:00:00  38.897416  77.036833   16.255868       0.000609
6    2 2017-02-10 07:30:00  38.851426  77.042298    0.000000       0.000000
11   2 2017-02-10 10:00:00  40.776931  73.876155  344.880549       0.038320
2    2 2017-02-10 16:23:00  40.748249  73.984191  335.727502       0.010498
8    2 2017-02-10 19:51:00  40.782869  73.967544  339.206320       0.007629
7    3 2017-06-09 10:20:00  38.917346  77.222553    0.000000       0.000000
0    3 2017-06-09 22:20:00  38.795333  77.008883   22.942974       0.000531
9    3 2017-06-10 06:42:00  38.954268  77.449695   20.070609       0.000274
4    3 2017-06-10 10:00:00  42.366211  71.020943  648.450485       0.007611

From here, I will leave it to you to figure out how to grab the last entry for each ID.

pault
  • 41,343
  • 15
  • 107
  • 149
  • This looks like it should do the trick. Couple quick questions though. First, is the distance between the current point and the initial point (occurring at the earliest time), or the immediately previous point? And second, kmps can be changed to kmph by changing `dist_km / (time_end - time_start).second` to `.hour`? – andrewr Jan 25 '18 at 03:56
  • For your first question, it's from the initial point. Second question: I think so but you should look up the docs. It wouldn't hurt for you to do some coding/debugging on your own. – pault Jan 25 '18 at 14:44
  • Okay, the main problem/question was to find the distance and velocity between not just the current point and the initial point, but the current point and the preceding (or next) point so I could find the distance and velocity between points 1/2, points 2/3, points 3/4, and so on. Do you have an idea how that could be accomplished? – andrewr Jan 25 '18 at 14:51
  • Yes I have many ideas, but it would help if you tried to solve it on your own first. Your question did not specify any of these things. If you showed some code and let me know where you are having trouble, maybe I can point you in the right direction. Also, please read [ask] and try to provide a [mcve]. – pault Jan 25 '18 at 14:53
  • Actually, my question did specify I am looking to find both distance and velocity between each point, however, I can go back and make that clearer. Also, I do not have code at this moment because I am unsure where to start, that is why I posed the question in the first place. – andrewr Jan 25 '18 at 15:06
  • 2
    SO is not a free code writing service. I have given you plenty to get started. I am confident that you can do the research and take it from here. No other user bothered to even respond to you (perhaps I shouldn't have either). Play around with this and see how far you can go. If you run into trouble, then try posting a new _specific_ question. Good luck. Related: [How much research effort is expected of Stack Overflow users?](https://meta.stackoverflow.com/a/261593/5858851) – pault Jan 25 '18 at 15:11
  • 1
    I wasn't asking for free code writing, so you don't have to be so condescending, Paul. I just was asking for help. I understand my problem, and how I'd like to try and solve it, but implementing this in code is where I am running into problems given I'm not the most proficient programmer. I understand the solution you provided, and I thank you for it. – andrewr Jan 25 '18 at 15:31
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/163884/discussion-between-pault-and-andrew-r). – pault Jan 25 '18 at 15:33