Finding a count column based on 3 different column in pandas

Question

I have a pandas data frame in the form of:

user_id referral_code referred_by
1        A              None
2        B              A
3        C              B
5        None           None
6        E              B
7        None           none

....

What I want to do is to create another column weight for each user id such that it will contain the total number of references he has done to others as well as the the number of time he was referred i.e I have to check if the referral_code of a user id is present in the referred_by column and count the frequency of the same and also add 1 if the referred_by column has a entry for the user.

Expected output is:

user_id referral_code referred_by weights
1        A              None       1
2        B              A          3
3        C              B          1
5        None           None       None
6        E              B          1
7        None           none       none

The approaches if have tried is using df.grouby along with size and count but nothing is giving the expected output.

@AlexandreB. Because for user id 2 he was referred by A. And he then he himself refers others (in this case 3 and 6). So 1+2. — Imdadul Choudhury, May 16 '19 at 07:18
Because 1 was not referred by anyone but he has referred another one( B in this case). So 0+1. — Imdadul Choudhury, May 16 '19 at 07:42

score 0 · Answer 1 · answered May 16 '19 at 07:26

0

What you can do is using weights = df.referred_by.value_counts()['myword']+1 and then add it to your df in the column weights !

answered May 16 '19 at 07:26

Hanggy

25
9

What is `['my word']` here? – Imdadul Choudhury May 16 '19 at 07:30
Your letter like `df.referred_by.value_counts()['A']` or `df.referred_by.value_counts()['B']` – Hanggy May 16 '19 at 07:31

score 0 · Accepted Answer · answered May 16 '19 at 08:27

You want to build a new conditional column. If the conditions are simple enough, you can do it with np.where. I suggest you to have a look at this post.

Here, it's quite complex, there shoud have a solution with np.where but not really obvious. In this case, you can use the apply method. It gives you the opportunity the write conditions as complex as you want. Using apply is less efficient than np.where since you need a python abstraction. Depends on your dataset and the complexity of your conditions.

Here an example with apply:

df = pd.DataFrame(
    [[1, "A" ,   None],
    [2 , "B" ,   "A"],
    [3 , "C" ,   "B"],
    [5 , None,   None],
    [6 , "E"  ,  "B"],
    [7 , None ,  None]],
    columns = 'user_id referral_code referred_by'.split(' ')
)
print(df)
#    user_id referral_code referred_by
# 0        1             A        None
# 1        2             B           A
# 2        3             C           B
# 3        5          None        None
# 4        6             E           B
# 5        7          None        None

weight_refered_by = df.referred_by.value_counts()
print(weight_refered_by)
# B    2
# A    1

def countWeight(row):
    count = 0

    if row['referral_code'] in weight_refered_by.index:
        count = weight_refered_by[row.referral_code]

    if row["referred_by"] != None:
        count += 1

    # If referral_code is none, result is none 
    # because referred_by is included in referral_code
    if row["referral_code"] == None:
        count = None
    return count

df["weights"] = df.apply(countWeight, axis=1)
print(df)
#    user_id referral_code referred_by  weights
# 0        1             A        None      1.0
# 1        2             B           A      3.0
# 2        3             C           B      1.0
# 3        5          None        None      NaN
# 4        6             E           B      1.0
# 5        7          None        None      NaN

Hope that help !

thanks. Does the work. Although I tried by converting the dataframe to list and perform the operation, this solution works better. — Imdadul Choudhury, May 16 '19 at 11:22

Finding a count column based on 3 different column in pandas

2 Answers2