Changing a cell value based on other rows and columns

Question

I have a dataframe called Result that comes from a SQL query:

Loc  ID   Bank
1    23   NULL
1    24   NULL
1    25   NULL
2    23   6
2    24   7
2    25   8

I am trying to set the values of Loc == 1 Bank equal to the Bank of Loc == 2 when the ID is the same, resulting in:

Loc  ID   Bank
1    23   6
1    24   7
1    25   8
2    23   6
2    24   7
2    25   8

Here is where I am at with the code, I know the ending is super simple I just can't wrap my head around a solution that doesn't involve iterating over every row (9000~).

result.loc[(result['Loc'] == '1'), 'bank'] = ???

Answers all assume that your `Loc` values are integer, which you don't. Do we have to take care of this ? — keepAlive, Feb 08 '21 at 23:14
`Loc` is actually a string in the real dataframe, this is a very simplified version in the post — PrivatePatron, Feb 08 '21 at 23:23

rhug123 · Answer 1 · 2023-08-07T23:11:44.270

Try using fillna() and transform()

df.fillna({'Bank':df.groupby('ID')['Bank'].transform('first')})

Original Answer:

You can try this. It uses map() to get the values from ID.

for_map = df.loc[df['Loc'] == 2].set_index('ID')['Bank'].squeeze().to_dict()
df.loc[df['Loc'] == 1,'Bank'] = df.loc[df['Loc'] == 1,'Bank'].fillna(df['ID'].map(for_map))

Output:

   Loc  ID  Bank
0    1  23   6.0
1    1  24   7.0
2    1  25   8.0
3    2  23   6.0
4    2  24   7.0
5    2  25   8.0

keepAlive · Answer 2 · 2021-02-09T00:11:19.897

Why not use pandas.MultiIndex ?

Commonalities

# Arguments,
_0th_level = 'Loc'
merge_key  = 'ID'
value_key  = 'Bank'  # or a list of colnames or `slice(None)` to propagate all columns values.
src_key    = '2'
dst_key    = '1'
# Computed once for all,
df         = result.set_index([_0th_level, merge_key])
df2        = df.xs(key=src_key, level=_0th_level, drop_level=False)
df1_       = df2.rename(level=_0th_level, index={src_key: dst_key})

First (naive) approach

df.loc[df1_.index, value_key] = df1_
# to get `result` back : df.reset_index()

Second (robust) approach

That being shown, the first approach may be illegal (since pandas version 1.0.0) if there is one or more missing label [...].

So if you must ensure that indexes exist both at source and destination, the following does the job on shared IDs only.

df1 = df.xs(key=dst_key, level=_0th_level, drop_level=False)
idx = df1.index.intersection(df1_.index)  # <-----
df.loc[idx, value_key] = df1_.loc[idx, value_key]

sammywemmy · Answer 3 · 2021-02-08T23:30:55.803

You could do a self merge on the dataframe, on ID, then filter for rows where it is equal to 2:

(
    df.merge(df, on="ID")
    .loc[lambda df: df.Loc_y == 2, ["Loc_x", "ID", "Bank_y"]]
    .rename(columns=lambda x: x.split("_")[0] if "_" in x else x)
    .astype({"Bank": "Int8"})
    .sort_values("Loc", ignore_index=True)
)

     Loc    ID  Bank
0     1     23  6
1     1     24  7
2     1     25  8
3     2     23  6
4     2     24  7
5     2     25  8

You could also stack/unstack, although this fails if you have duplicate indices:

(
    df.set_index(["Loc", "ID"])
    .unstack("Loc")
    .bfill(1)
    .stack()
    .reset_index()
    .reindex(columns=df.columns)
)

    Loc     ID  Bank
0     1     23  6.0
1     2     23  6.0
2     1     24  7.0
3     2     24  7.0
4     1     25  8.0
5     2     25  8.0

Changing a cell value based on other rows and columns

3 Answers3