1

I have a df1 that looks like:

Out[43]:
city1   city2
0   YYZ     SFO
1   YYZ     EWR
2   YYZ     DFW
3   YYZ     LAX
4   YYZ     YYC

I have another df2 that I want to slice based on df1 i.e. city1 and city2 in df2 have to correspond to the same city1 and city2 pair in df1.

I only want rows in df2 where the city1 and city2 columns match exactly as those in df1.

Do I have to merge/join the dfs together as a left join on df1 as the only clean way to do this? I don't want to create another column with the value as a concatenation of city1 and city2. That will work but there must be an easy way that is built into pandas without having to manipulate my data.

UPDATE:

df2 has more than just 2 columns. It has a total of 20 columns. For simplicity I only mentioned city1 and city2.

In any case, I want to return all rows in df2 (df with 20 columns) where the city1 and city2 pair match what is present in df1.

codingknob
  • 11,108
  • 25
  • 89
  • 126

2 Answers2

2

setup

df2 = pd.DataFrame([
        ['YYZ', 'SFO', 1],
        ['YYZ', 'YYD', 1]
    ], columns=['city1', 'city2', 'val'])

cols = ['city1', 'city2']

option 1
numpy broadcasting

multi_isin_cond = (df2[cols].values[:, None] == df1[cols].values).all(-1).any(-1)
df2.loc[multi_isin_cond]

  city1 city2  val
0   YYZ   SFO    1

option 2
pandas merge

df2.merge(df1, on=cols)

  city1 city2  val
0   YYZ   SFO    1

option 3
Don't know what to call it, Don't recommend it.

idx = pd.MultiIndex.from_arrays(df1.values.T, names=df1.columns)
df2[df2[cols].apply(tuple, 1).isin(idx)]

  city1 city2  val
0   YYZ   SFO    1
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • @MohammadYusufGhazi I'm getting there ;-) – piRSquared Dec 23 '16 at 16:00
  • can you do some magic here? http://stackoverflow.com/questions/41283161/creating-complex-nested-dictionaries-from-pandas-dataframe – Mohammad Yusuf Dec 23 '16 at 16:02
  • @MohammadYusufGhazi that made my head hurt. I've got ideas spinning in my head. Maybe I can work it out while doing the chores I'm supposed to do. – piRSquared Dec 23 '16 at 16:21
  • kk. A generic solution to converting df to nested json would be a good idea. Thanks for checking though :) – Mohammad Yusuf Dec 23 '16 at 16:24
  • Gentlemen, I just updated the original question. Not sure if the solutions change based on what I just wrote. I don't know whether option#2 works because df1 only has 2 common columns. df1 only has 2 columns while df2 has 20 columns. – codingknob Dec 23 '16 at 17:31
  • 1
    @codingknob If you notice, I defined `df2` with 3 columns. This should generalize for all 3 options. Try it out and let me know. – piRSquared Dec 23 '16 at 17:37
0

You do not have to join. Here is one way to do it.

import pandas as pd

z = {'city1' : pd.Series(['YYZ', 'YYZ','YYZ','YYZ']), 'city2' : pd.Series(['SFO', 'EWR', 'DFW', 'LAX'])}
df = pd.DataFrame(z)
x = {'city1' : pd.Series(['YYZ', 'YYX','YYZ','YYX']), 'city2' : pd.Series(['SFO', 'EWR', 'DFW', 'LAX'])}
df1 = pd.DataFrame(x)

df.head()

  city1 city2
0   YYZ   SFO
1   YYZ   EWR
2   YYZ   DFW
3   YYZ   LAX

df1.head()

  city1 city2
0   YYZ   SFO
1   YYX   EWR
2   YYZ   DFW
3   YYX   LAX

condition = df['city1'] == df1['city1']

df[(condition)]

  city1 city2
0   YYZ   SFO
2   YYZ   DFW
gold_cy
  • 13,648
  • 3
  • 23
  • 45
  • I want to return all rows in df2 where the pair of cities city1 and city2 in df2 match the conditional df1.The code above does not do that. It only addresses city1. – codingknob Dec 23 '16 at 17:23
  • Sorry about that, we can just change the condition then to be `condition = (df['city1'] == df1['city1']) & (df['city2'] == df1['city2'])` – gold_cy Dec 23 '16 at 17:56