1

I have 2 different dataframes. One looks like this

arriv depart stop_id
12:35 12:40  a2b
23:00 01:00  a1e

Ther other looks like this:

stop_id  lon   lat   name
a1e      12.1  13.2  Old Church
a2b      12.2  13.1  Postal Service

now I would like to create a dataframe that looks like this with a subset of the data:

arriv depart stop_id  lon   lat   name
12:35 12:40  a2b      12.2  13.1  Postal Service
23:00 01:00  a1e      12.1  13.2  Old Church

basicly merging the two dataframes, but with loads of double entries for the list of stops. What would be a good way to do this?

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
user2741831
  • 2,120
  • 2
  • 22
  • 43

2 Answers2

5

I believe what you want is: "for every instance in the first dataframe , merge the lon lat and name columns (available in another dataframe)."

If that's the case, you could use pd.merge():

import pandas as pd
timeDF = pd.DataFrame({
          "arriv":["12:35","23:00","13:35","12:35"],
          "depart":["12:40","01:00","14:25","14:35"],
          "stop_id":["a2b","a1e","a2b","a1e"]
          })
stopDF = pd.DataFrame({
          "long":["12.1","13.1"],
          "lat":["13.0","14.3"],
          "name":["nameA","nameB"],
          "stop_id":["a2b","a1e"]
          })
print(pd.merge(timeDF,stopDF,how='left',on='stop_id'))

INPUT:

arrive_departure_dataframe:
    arriv depart stop_id
0  12:35  12:40     a2b
1  23:00  01:00     a1e
2  13:35  14:25     a2b
3  12:35  14:35     a1e

stops_information_dataframe:
    long   lat   name stop_id
0  12.1  13.0  nameA     a2b
1  13.1  14.3  nameB     a1e

OUTPUT:

   arriv depart stop_id  long   lat   name
0  12:35  12:40     a2b  12.1  13.0  nameA
1  23:00  01:00     a1e  13.1  14.3  nameB
2  13:35  14:25     a2b  12.1  13.0  nameA
3  12:35  14:35     a1e  13.1  14.3  nameB
2

You can use pandas.DataFrame.merge() function in the main pandas namespace to combine those dataframes on the common column stop_id :

import pandas as pd

df1 = pd.DataFrame(
    {'arriv': ['12:35', '23:00'],
     'depart': ['12:40', '01:00'],
     'stop_id': ['a2b', 'a1e']}
)

df2 = pd.DataFrame(
    {'stop_id': ['a2b', 'a1e'],
     'lon': ['12.1', '12.2'],
     'lat': ['12.2', '13.1'],
     'name': ['Old Church', 'Postal Service']}
)
print (pd.merge(df1, df2, on='stop_id'))

in your case no need to define how parameter as inner being the default. But sometimes you can use left, right or outer (similar to a SQL full outer join) options such as

pd.merge(df1, df2, how='outer', on='stop_id')
user2741831
  • 2,120
  • 2
  • 22
  • 43
Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55