I have a pd.DataFrame
object that contains tweets and re-tweets from different users. What I'm trying to accomplish is to populate a column of rt_uid
(i.e. retweet user id) with the corresponding uid
of the user being retweeted. So the desired output will be:
Desired Output
tw_id tw_uid rt_uid tw_uname rt_uname
0 0 10 12.0 u1 u3
1 1 10 12.0 u1 u3
2 2 12 NaN u3 None
3 3 13 NaN u4 None
4 4 14 10.0 u5 u1
5 5 15 10.0 u6 u1
6 6 16 10.0 u7 u1
7 7 16 NaN u7 None
8 8 16 NaN u7 None
9 9 12 13.0 u3 u4
the column rt_uid
contains user ids of the users that were retweeted beforehand.
Code 1
presents a toy example of the dataset with my solution that didn't work out:
Code 1
import pandas as pd
tw_df = pd.DataFrame(dict(
tw_id=np.arange(10),
tw_uid=[10, 10, 12, 13, 14, 15, 16, 16, 16, 12],
rt_uid=[None]*10,
tw_uname=['u1', 'u1', 'u3', 'u4', 'u5', 'u6', 'u7', 'u7', 'u7', 'u3'],
rt_uname=['u3', 'u3', None, None, 'u1', 'u1', 'u1', None, None, 'u4'],
)
)
tw_df.loc[~tw_df.loc[:, 'rt_uname'].isnull(), 'rt_uid'] = tw_df.loc[tw_df.loc[:, 'tw_uname'].isin(tw_df.loc[:, 'rt_uname']), 'tw_uid']
tw_df
Wrong Output
As you can see, the the rt_uid
column merely contain mirrors the tw_uid
column.
- I've looked at this post, but in my case, I need the data to be filtered for all the usernames (which may change, repeat etc.) and not for a specific one, so couldn't find the answer there.
What am I missing here? Thanks in advance.