0

I feel like there must be a more pythonic way (ie: easier and more straightforward) to change column values in the dataframe I am working with. Basically, I am trying to edit the values of a column match based on values of the 'ID' column.

Take this example:

data = [['tom', 10, 111], ['nick', 15, 112], ['juli', 14, 113], ['mary', 17, 114]] 

# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['Name', 'Age', 'ID']) 

I have a simple dataframe, df

Now I make several slices of the dataframe

df2 = df.loc[df['ID'] == 111]
df3 = df.loc[df['ID'] == 112]
df4 = df.loc[df['ID'] == 113]
df5 = df.loc[df['ID'] == 114]

What I want to do is make a new column in my original dataframe (called 'match'). Then I want to compare df2,df3,df4,df5 to it, based on the ID column. In the 'match' column I will record when those matches occurred. Let me step through my process.

If I do this...

df['match_checker'] = df2['ID'].isin(df['ID'])
df.loc[df['match_checker'] == True, 'match'] = 'Round 1'

df['match_checker'] = df3['ID'].isin(df['ID'])
df.loc[df['match_checker'] == True, 'match'] = 'Round 2'

df['match_checker'] = df4['ID'].isin(df['ID'])
df.loc[df['match_checker'] == True, 'match'] = 'Round 3'

df['match_checker'] = df5['ID'].isin(df['ID'])
df.loc[df['match_checker'] == True, 'match'] = 'Round 4'

The resulting dataframe looks like this. This is the desired outcome. (the match_checker column will change for each iteration).

   Name  Age   ID match_checker    match
0   tom   10  111           NaN  Round 1
1  nick   15  112           NaN  Round 2
2  juli   14  113           NaN  Round 3
3  mary   17  114          True  Round 4

I have the desired outcome, but creating a subset of the dataframe, then comparing it to the original dataframe seems like a bad way to do it.

Note I'm not looking for the following solution:

df.loc[df['ID'] == 111), 'match'] = 'Round 1'
Erich Purpur
  • 1,337
  • 3
  • 14
  • 34
  • 2
    IIUC you are looking for [`np.select`](https://het.as.utexas.edu/HET/Software/Numpy/reference/generated/numpy.select.html). See the 2nd answer from [this post](https://stackoverflow.com/questions/39109045/numpy-where-with-multiple-conditions). – Henry Yik Jun 29 '20 at 01:59

2 Answers2

2

How about this:

rounds = {
    111: 'Round 1',
    112: 'Round 2',
    113: 'Round 3',
    114: 'Round 4',
}

df['match'] = [rounds[i] for i in df.ID]


   Name  Age   ID    match
0   tom   10  111  Round 1
1  nick   15  112  Round 2
2  juli   14  113  Round 3
3  mary   17  114  Round 4
cbare
  • 12,060
  • 8
  • 56
  • 63
1

I think you just want the "match" and "match_checker" update one by one. So I did this for you.

import numpy as np
import pandas as pd


class ColumnUpdater(object):
    def __init__(self, to_update_df, prefix="Record"):
        self.to_update_df = to_update_df
        self.prefix = prefix
        self.current_index = 0
        self.match_checker = None

    def update_func(self, util_id):
        if isinstance(util_id, pd.DataFrame):
            return
        util_id = util_id.iloc[0]
        self.to_update_df.loc[:, "match_checker"] = self.to_update_df["ID"] == util_id
        current_match = f"{self.prefix} {self.current_index}"
        self.current_index += 1
        return current_match

    def process(self):
        self.to_update_df.loc[:, "match_checker"] = np.nan
        self.to_update_df.loc[:, "util_id"] = self.to_update_df["ID"]
        self.to_update_df.loc[:, "match"] = self.to_update_df.groupby("ID")[
            "util_id"
        ].transform(self.update_func)
        return self.to_update_df.drop(columns="util_id")


if __name__ == "__main__":
    data = np.repeat(
        [["tom", 10, 111], ["nick", 15, 112], ["juli", 14, 113], ["mary", 17, 114]],
        5,
        axis=0,
    )
    # Create the pandas DataFrame
    df = pd.DataFrame(data, columns=["Name", "Age", "ID"])
    result = ColumnUpdater(df).process()
    print(result)
Xu Qiushi
  • 1,111
  • 1
  • 5
  • 10