I have two data frames. First, saves coordinates and places
data = [['xx1', 1,10,'Madrid'], ['xx2',15,20,'London']]
df1 = pd.DataFrame(data, columns = ['precode', 'start', 'stop','city'])
The second
data2 = [['xx1', 3,11,'location1',], ['xx2',9,17,'location2'],['xx2',21,30,'location3'],['xx5',1,19,'location4']]
df2 = pd.DataFrame(data2, columns = ['precode', 'start', 'stop','locations'],)
I want to check if locations are in city. If so add in df2 the name of the cities
precode start stop location city
xx1 3 11 location1 Madrid
xx2 9 17 location2 Madrid;London # If more than one city affected, add as many as cities affected as found
xx2 21 30 location3 NaN
xx5 1 19 location4 NaN
With the help of @Corralien I got this perfect solution from this post
df2['city'] = (
df2.loc[df2['precode'].isin(df1['precode'])]
.reset_index().merge(df1, how='cross', suffixes=('', '_'))
.query('start.between(start_, stop_) | stop.between(start_, stop_)')
.groupby('index')['city'].apply(list).str.join(';')
)
However, In the cluster I am working, the last version they have is 1.5, so I can't use how=cross
.
I have been working around and looking at other solutions but they are not exactly what I need.