How to replace all occurrences in a column with a given value based on a condition applied to a different column

Question

I have a dataframe that looks like this

Name	Code	Amount	Seller
Sildenafil 1045	VGRA	32	Rite Aid
Ibuprofen 378	ADVL	209	CVS
Paracetamol	PCTML	87	Keystone
Aspirin 9852	DISP	372	Rite Aid
Ibuprofen 1992	MOTR	87	Walgreens
Benadryl	BDRL	120	CVS
Aspirin 0541	SPRN	197	H Mart
Sildenafil 002	VIAG	12	Omnicare
Ibuprofen 378	ADVL	301	Keystone
Paracetamol	PCTML	673	Walgreens
Ibuprofen 1992	MOTR	87	Omnicare
Sildenafil 1045	VGRA	45	H Mart
Benadryl	BDRL	111	Keystone
Aspirin 9852	DISP	285	CVS
Sildenafil 002	VIAG	79	Rite Aid
Aspirin 0541	SPRN	431	Omnicare

where df.Code.unique() = [VGRA, ADVL, PCTML, DISP, MOTR, BDRL, SPRN, VIAG]

I want to identify the medicines of the same 'Name' (but different ending numbers) with the same 'Code'

Ex. 'Ibuprofen 378' and 'Ibuprofen 1992' have two different codes- 'ADVL' and 'MOTR' respectively. I want to replace all instances of 'MOTR' with 'ADVL'. Same for Sildenafil and Aspirin and any other drugs with the same coding situation. The output should look like this:

Name	Code	Amount	Seller
Sildenafil 1045	VGRA	32	Rite Aid
Ibuprofen 378	ADVL	209	CVS
Paracetamol	PCTML	87	Keystone
Aspirin 9852	DISP	372	Rite Aid
Ibuprofen 1992	ADVL	87	Walgreens
Benadryl	BDRL	120	CVS
Aspirin 0541	DISP	197	H Mart
Sildenafil 002	VGRA	12	Omnicare
Ibuprofen 378	ADVL	301	Keystone
Paracetamol	PCTML	673	Walgreens
Ibuprofen 1992	ADVL	87	Omnicare
Sildenafil 1045	VGRA	45	H Mart
Benadryl	BDRL	111	Keystone
Aspirin 9852	DISP	285	CVS
Sildenafil 002	VGRA	79	Rite Aid
Aspirin 0541	DISP	431	Omnicare

where df.Code.unique() = [VGRA, ADVL, PCTML, DISP, BDRL]

I cannot figure out how to do this, the only thing I've managed is generating a third variable using

df.Name_Code = df.Name.astype(str).str[:5]

Name	Code	Amount	Seller	Name_Code
Sildenafil 1045	VGRA	32	Rite Aid	SILDE
Ibuprofen 378	ADVL	209	CVS	IBUPR
Paracetamol	PCTML	87	Keystone	PARAC
Aspirin 9852	DISP	372	Rite Aid	ASPIR
Ibuprofen 1992	ADVL	87	Walgreens	IBUPR
Benadryl	BDRL	120	CVS	BENAD
Aspirin 0541	DISP	197	H Mart	ASPIR
Sildenafil 002	VGRA	12	Omnicare	SILDE
Ibuprofen 378	ADVL	301	Keystone	IBUPR
Paracetamol	PCTML	673	Walgreens	PARAC
Ibuprofen 1992	ADVL	87	Omnicare	IBUPR
Sildenafil 1045	VGRA	45	H Mart	SILDE
Benadryl	BDRL	111	Keystone	BENAD
Aspirin 9852	DISP	285	CVS	ASPIR
Sildenafil 002	VGRA	79	Rite Aid	SILDE
Aspirin 0541	DISP	431	Omnicare	ASPIR

And while that does uniformize the drugs, it's less helpful. Reading "VGRA" for ex. would be more helpful than reading "SILDE". How can I modify this?

score 0 · Answer 1 · answered Jul 12 '23 at 16:21

If I understand you correctly, you want to set the Code to the first observed Code value for each medicine:

df['Code'] = df.groupby(df['Name'].str.split().str[0])['Code'].transform('first')
print(df)

Prints:

               Name   Code  Amount     Seller
0   Sildenafil 1045   VGRA      32   Rite Aid
1     Ibuprofen 378   ADVL     209        CVS
2       Paracetamol  PCTML      87   Keystone
3      Aspirin 9852   DISP     372   Rite Aid
4    Ibuprofen 1992   ADVL      87  Walgreens
5          Benadryl   BDRL     120        CVS
6      Aspirin 0541   DISP     197     H Mart
7    Sildenafil 002   VGRA      12   Omnicare
8     Ibuprofen 378   ADVL     301   Keystone
9       Paracetamol  PCTML     673  Walgreens
10   Ibuprofen 1992   ADVL      87   Omnicare
11  Sildenafil 1045   VGRA      45     H Mart
12         Benadryl   BDRL     111   Keystone
13     Aspirin 9852   DISP     285        CVS
14   Sildenafil 002   VGRA      79   Rite Aid
15     Aspirin 0541   DISP     431   Omnicare

jaellis · Answer 2 · 2023-07-12T20:39:04.673

This question shares some similarities with another I found, so I'd definitely recommend searching the archives.

Order-Agnostic Approach

I say order-agnostic because the other answer is ideal if you're looking to just use the 'first' value.

It seems like you're going to need a dict/hashmap that serves as your source of truth for re-defining the Code based on the Name. Given what you wrote in the question it would look something like this:

name_to_code_mapping = {"Ibuprofen": "ADVL", "Sildenafil": "...", ...}

You can then use the dict inside of a function that you can pass to something like df.apply:

def change_code_based_on_drug_name(df_row: pd.Series, column_to_find: str, column_to_change: str, map: dict):
    val_to_find = df_row[column_to_find]
    ## You'll have to split the numbers off of the name here
    trimmed_df_value = val_to_find.split(" ")[0]
    
    if trimmed_df_value in map:
        df_row[column_to_change] = map[trimmed_df_value]
    return df_row

And feed it to that df.apply:

df = df.apply(lambda row: change_code_based_on_drug_name(row, "Name", "Code", name_to_code_mapping), axis=1)

Alternative, Less Ideal Approach

You could also iterate over the map and use .loc[] (like in the above S/O post) to change the values but you would need to create a column of the name minus numbers:

## Notice that since I'm using apply over a single column (i.e. Series) here I don't need to axis paramter
df["Name_without_numbers"] = df["Name"].apply(lambda name: name.split(" ")[0])

You'd then set that to the index so you can use the .loc method.

df.set_index("Name_without_numbers", inplace=True)

for key, val in name_to_code_mapping.items():
    df.loc[key, "Code"] = val

I don't like or recommend that second approach because:

It adds an unnecessary column, and
Creates a non-unique index

Hope that helps!

How to replace all occurrences in a column with a given value based on a condition applied to a different column

2 Answers2

Order-Agnostic Approach

I say order-agnostic because the other answer is ideal if you're looking to just use the 'first' value.

Alternative, Less Ideal Approach