-2

I'm new to python and I have hit a wall with this problem. I have a df with ID's, timestamp and lat /longs. I need to find ID's that were are near each other by location and time. I've tried groupby with pandas, but the output just lumps everything under each unique ID. Is there a way to pair or group ID's by their proximity to each ID's lat/longs (<50m) and timestamp (<=24hrs). I appreciate any help. Thanks!

ID Timestamp Latitude Longitude
9269 11/23/2021 23:59 40.3291824 -105.656204
9269 11/19/2021 23:59 40.32294108 -105.8062935
9269 11/15/2021 23:59 40.13903661 -105.5979048
9269 11/15/2021 23:39 40.3291824 -105.656204
7359 11/11/2021 23:59 40.13903661 -105.5979048
7359 11/7/2021 23:59 40.32294108 -105.8062935
7359 11/7/2021 23:39 40.3291824 -105.656204
2259 12/6/2021 0:02 40.32294108 -105.8062935
2259 12/2/2021 0:02 40.3291824 -105.656204
2259 11/28/2021 0:02 40.13903661 -105.5979048
facepet
  • 21
  • 3
  • You need to start write some code to calculate the distance between the points, and show what you have done so far so we can help. – Malo Jan 01 '22 at 18:41
  • Please provide enough code so others can better understand or reproduce the problem. – Community Jan 01 '22 at 18:43

1 Answers1

0

I will show you some steps that you can use to solve your problem.

import numpy as np
import pandas as pd

# Creating a 2 dimensional numpy array
data = np.array([[1, 2,3,4,5,1, 2,3,4,5],[10, 20,30,40,50,10, 20,30,40,50],['s.40','s.50','s.40','s.50','s.40','s.50','s.40','s.50','s.40','s.50']])


# Creating pandas dataframe from numpy array
df= pd.DataFrame({'Column1': data[0,:], 'Column2': data[1,:],'Column3': data[2,:]})

I create a dataframe from a numpy array. In your dataframe in Timestamp Column you have data and time you need splint it and create a new column. You can do it as follow in my example:

def split_function(Column3):
    return Column3.split('.')[1]

df['split_column']=df.apply(lambda x: split_function(x.Column3), axis=1)

In your case u can use a blank space ' ' to do your split in return Column3.split(' ')[1] Than u need garante that ur column have the right type so u can use:

df['Column1']=pd.to_numeric(df['Column1'])
df['Column2']=pd.to_numeric(df['Column2'])

(See also to_datetime() and to_timedelta()

Than u need know how many interval you have, so you can use max and min method to help you find it:

In my example :

df['Column1'].max()
df['Column1'].min()

df['Column2'].max()
df['Column2'].min()

It will return 5 1 and 50 10

Let supose i want a interval of 25 based on my Column 2, So i will have 2 groups interval.

Finally i just need select the rows based on a column condition you can see more info about it here :

How do I select rows from a DataFrame based on column values?

list_dataframe_interval=[]

for i in range(2):
    print(i*25)
    print(type(i))

    df1=df.loc[(df['Column2'] >=i*25) & (df['Column1'] >= i*2.5)]
    df2=df1.loc[(i*25+25 >=df1['Column2']) & (i*2.5+2.5>=df1['Column1'])]
    

   
    list_dataframe_interval.append(df2)

So you can see the dataframe in this list

list_dataframe_interval[0]

list_dataframe_interval[1]

Vitor Bento
  • 384
  • 4
  • 17