2

I have the foll. 2 pandas dataframes:

df1 = pandas.DataFrame(data = {'col1' : [1, 2, 3, 4, 5], 'col2' : [10, 11, 12, 13, 14], 'col3' : [0,2,0,-1,0]}) 
df2 = pandas.DataFrame(data = {'col1' : [1, 2, 3], 'col2' : [10, 11, 12], 'col6' : [20, 31, 12]})

How do I change the value in col3 in df1 to 0 if both col1 and col2 have same value in df1 and df2. The result for df1 should look like this:

  col1  col2  col3
0     1    10     0
1     2    11     0
2     3    12     0
3     4    13    -1
4     5    14     0
user308827
  • 21,227
  • 87
  • 254
  • 417

2 Answers2

2

If you merge two DataFrames on col1 and col2, the resulting DataFrame will have rows where both DataFrames have the same value in those columns. However, pandas loses the index when merging. You can use reset_index prior to merging to keep the index and use that index in .loc:

df1.loc[df1.reset_index().merge(df2, on=['col1', 'col2'])['index'], 'col3'] = 0

df1
Out: 
   col1  col2  col3
0     1    10     0
1     2    11     0
2     3    12     0
3     4    13    -1
4     5    14     0
Community
  • 1
  • 1
ayhan
  • 70,170
  • 20
  • 182
  • 203
1

A quick numpy solution. i gets indices for every combination of one row from df1 and another row from df2. I use the == to determine which cells are equal. all(2) determines if all cells from one row equal all cells from the other row. If true, then the corresponding set of indices represent a match. So, i[0][matches] tells me all the rows from df1 that match the rows in df2 represented by i[1][matches]. But I only need to change values in df1 so I only use i[0][matches] to slice df1 on the 3rd column then assign 0.

def pir(df1, df2):
    i = np.indices((len(df1), len(df2)))
    matches = (df2.values[i[1], :2] == df1.values[i[0], :2]).all(2)
    df = df1.copy()
    df.iloc[i[0][matches], 2] = 0
    return df

pir(df1, df2)

enter image description here


Timing

def ayhan(df1, df2):
    df1 = df1.copy()
    df1.loc[df1.reset_index().merge(df2, on=['col1', 'col2'])['index'], 'col3'] = 0
    return df1

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • thanks @piRSquared, what does this line do: ` matches = (df2.values[i[1], :2] == df1.values[i[0], :2]).all(2)` – user308827 Aug 10 '16 at 13:59
  • It's quick for very small data frames. For larger (>1e5 rows) may be impractical because it's effectively a Cartesian product. Even for a moderate sized frames, `merge` should perform better. – ptrj Aug 10 '16 at 15:25