0

The linked post (here) does not answer this question - I'm aware of merging - there's nothing in the linked post which specifically answers the question that I've asked here though.


Given a list of values to match rows on - I would like to replace column values from data_1 with column values from data_2.

The following works - but I am drawing a complete blank as to how I should go about doing this with idiomatic pandas

replace = ["a", "c", "x"]

data_1 = pd.DataFrame(
    {
        "val_1": ["a", "b", "c", "x", "y", "z"],
        "val_2": ["one", "two", "three", "four", "five", "six"],
    }
)

data_2 = pd.DataFrame(
    {
        "val_1": [
            "a",
            "b",
            "c",
            "p",
            "q",
            "x",
            "y",
            "z",
        ],
        "val_2": ["ONE", "TWO", "THREE", "FOUR", "FIVE", "SIX", "SEVEN", "EIGHT"],
    }
)

# Expected output this is what I would like the final result to be
output = pd.DataFrame(
    {
        "val_1": ["a", "b", "c", "x", "y", "z"],
        "val_2": ["ONE", "two", "THREE", "SIX", "five", "six"],
    }
)

# Possible approach

Perhaps the following is the easiest :

df_m = pd.merge(data_1, data_2, on = ['val_1'])
data_1['val_2'] = np.where(df_m['val_1'].isin(replace), 
    df_m['val_2_y'], 
    df_m['val_2_x'],
)
baxx
  • 3,956
  • 6
  • 37
  • 75

1 Answers1

0

Try:

data_1.set_index('val_1', drop=True, inplace=True)
data_2.set_index('val_1', drop=True, inplace=True)

data_1.loc[data_1.index.isin(replace), 'val_2'] = data_2['val_2']
data_1.reset_index(inplace=True)
print(data_1)

  val_1  val_2
0     a    ONE
1     b    two
2     c  THREE
3     x    SIX
4     y   five
5     z    six

# check on your `output` df:
data_1.equals(output)
# True
ouroboros1
  • 9,113
  • 3
  • 7
  • 26