-1

I have a pandas data frame in the form of:

user_id referral_code referred_by
1        A              None
2        B              A
3        C              B
5        None           None
6        E              B
7        None           none

....

What I want to do is to create another column weight for each user id such that it will contain the total number of references he has done to others as well as the the number of time he was referred i.e I have to check if the referral_code of a user id is present in the referred_by column and count the frequency of the same and also add 1 if the referred_by column has a entry for the user.

Expected output is:

user_id referral_code referred_by weights
1        A              None       1
2        B              A          3
3        C              B          1
5        None           None       None
6        E              B          1
7        None           none       none

The approaches if have tried is using df.grouby along with size and count but nothing is giving the expected output.

2 Answers2

0

What you can do is using weights = df.referred_by.value_counts()['myword']+1 and then add it to your df in the column weights !

Hanggy
  • 25
  • 9
0

You want to build a new conditional column. If the conditions are simple enough, you can do it with np.where. I suggest you to have a look at this post.

Here, it's quite complex, there shoud have a solution with np.where but not really obvious. In this case, you can use the apply method. It gives you the opportunity the write conditions as complex as you want. Using apply is less efficient than np.where since you need a python abstraction. Depends on your dataset and the complexity of your conditions.

Here an example with apply:

df = pd.DataFrame(
    [[1, "A" ,   None],
    [2 , "B" ,   "A"],
    [3 , "C" ,   "B"],
    [5 , None,   None],
    [6 , "E"  ,  "B"],
    [7 , None ,  None]],
    columns = 'user_id referral_code referred_by'.split(' ')
)
print(df)
#    user_id referral_code referred_by
# 0        1             A        None
# 1        2             B           A
# 2        3             C           B
# 3        5          None        None
# 4        6             E           B
# 5        7          None        None

weight_refered_by = df.referred_by.value_counts()
print(weight_refered_by)
# B    2
# A    1

def countWeight(row):
    count = 0

    if row['referral_code'] in weight_refered_by.index:
        count = weight_refered_by[row.referral_code]

    if row["referred_by"] != None:
        count += 1

    # If referral_code is none, result is none 
    # because referred_by is included in referral_code
    if row["referral_code"] == None:
        count = None
    return count

df["weights"] = df.apply(countWeight, axis=1)
print(df)
#    user_id referral_code referred_by  weights
# 0        1             A        None      1.0
# 1        2             B           A      3.0
# 2        3             C           B      1.0
# 3        5          None        None      NaN
# 4        6             E           B      1.0
# 5        7          None        None      NaN

Hope that help !

Alexandre B.
  • 5,387
  • 2
  • 17
  • 40