Sorry for bad English for writing the title. What I would like to do is to determine the location of df
by comparing with another data frame df2
.
df
is a data frame that shows the variation of longitude and latitude in a period of time. It is a data frame with more than 40,000 of rows.
df=
Longitude Latitude
Time
2020-01-01 01:00 100.0 20.0
2020-01-01 01:01 100.2 20.1
2020-01-01 01:02 300.1 30.0
2020-01-01 01:03 200.1 40.0
2020-01-01 01:04 50.0 50.0
df2
is the dataframe that contain the location data. It is a data frame with around 30 rows.
df2=
Longitude Latitude Location
0 90.0 20.0 District A
1 210.0 60.0 District B
The method to determine where I am located will be done by Pythagoras theorem. Each row of data in df
will compare with every row in df2
to find the shortest distance, which the shortest distance is where I am located.
Take 2020-01-01 01:01
as an example.
First, compare with df2 row 0
, the calculation is [(100.2-90.0)^2+(20.1-20.0)^2]^0.5
.
Second, compare with df2 row 1
, the calculation is [(100.2-210.0)^2+(20.1-60.0)^2]^0.5
Since the distance between the location and District A
is smaller than the distance between the location and District B
, therefore the location on 2020-01-01 01:01
will be District A
. The expected output is given as follows.
df=
Longitude Latitude Location
Time
2020-01-01 01:00 100.0 20.0 District A
2020-01-01 01:01 100.2 20.1 District A
2020-01-01 01:02 300.1 30.0 District B
2020-01-01 01:03 200.1 40.0 District B
2020-01-01 01:04 50.0 50.0 District A
My approach: To get my expected result first I will create a df['Loaction']
column full of NaN
, then I will do the calculation and extract the df2['Location']
to substitute the NaN
value in df['Loaction']
.
However, what I am confused about is how to get the calculation results. Are there any related pandas function or NumPy function that can compare a row from df
with multiple rows from df2
? Is there a better approach? Thank you!