Populate column in `Pandas.DataFrame` based on matches in another columns

Question

I have a pd.DataFrame object that contains tweets and re-tweets from different users. What I'm trying to accomplish is to populate a column of rt_uid (i.e. retweet user id) with the corresponding uid of the user being retweeted. So the desired output will be:

Desired Output

   tw_id  tw_uid  rt_uid tw_uname rt_uname
0      0      10    12.0       u1       u3
1      1      10    12.0       u1       u3
2      2      12     NaN       u3     None
3      3      13     NaN       u4     None
4      4      14    10.0       u5       u1
5      5      15    10.0       u6       u1
6      6      16    10.0       u7       u1
7      7      16     NaN       u7     None
8      8      16     NaN       u7     None
9      9      12    13.0       u3       u4

the column rt_uid contains user ids of the users that were retweeted beforehand.

Code 1 presents a toy example of the dataset with my solution that didn't work out:

Code 1

import pandas as pd


tw_df = pd.DataFrame(dict(
        tw_id=np.arange(10),
        tw_uid=[10, 10, 12, 13, 14, 15, 16, 16, 16, 12],
        rt_uid=[None]*10,
        tw_uname=['u1', 'u1', 'u3', 'u4', 'u5', 'u6', 'u7', 'u7', 'u7', 'u3'],
        rt_uname=['u3', 'u3', None, None, 'u1', 'u1', 'u1', None, None, 'u4'],
    )
)
tw_df.loc[~tw_df.loc[:, 'rt_uname'].isnull(), 'rt_uid'] = tw_df.loc[tw_df.loc[:, 'tw_uname'].isin(tw_df.loc[:, 'rt_uname']), 'tw_uid']
tw_df

Wrong Output

As you can see, the the rt_uid column merely contain mirrors the tw_uid column.

I've looked at this post, but in my case, I need the data to be filtered for all the usernames (which may change, repeat etc.) and not for a specific one, so couldn't find the answer there.

What am I missing here? Thanks in advance.

Can you add the expected output for code 1, not in a picture format? — Dani Mesejo, Nov 29 '20 at 20:20
because `u3` has `tw_uid` == 12 (see line 2). I.e., when `u3` tweeted - its' uid was recorded at the `tw_uid` column. So I need to place it in the `rt_uid` column when other user retweets `u3`. — Michael, Nov 29 '20 at 20:31

score 2 · Accepted Answer · answered Nov 29 '20 at 20:31

Create a dictionary of tw_uname and tw_uid using dict(zip()). Map the dict to rt_uname

tw_df['rt_uid']=tw_df['rt_uname'].map(dict(zip(tw_df.tw_uname,tw_df.tw_uid)))



 tw_id  tw_uid  rt_uid tw_uname rt_uname
0      0      10    12.0       u1       u3
1      1      10    12.0       u1       u3
2      2      12     NaN       u3     None
3      3      13     NaN       u4     None
4      4      14    10.0       u5       u1
5      5      15    10.0       u6       u1
6      6      16    10.0       u7       u1
7      7      16     NaN       u7     None
8      8      16     NaN       u7     None
9      9      12    13.0       u3       u4

Happy it was of help. Cheers mate – wwnde Nov 29 '20 at 20:34 — wwnde, Nov 29 '20 at 20:34

Populate column in `Pandas.DataFrame` based on matches in another columns

1 Answers1