-1

The essence of the question is "creating a new column in the DataFrame", based on existing column 'user_id' and a dictionary {dict}, which holds as keys of a dictionary values of column 'user_id' and as values of the dictionary their types.

I have the following DataFrame df.

    df = pd.DataFrame({"user_id" : [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5], 
                  "value" : [0, 100, 50, 0, 25, 50, 100, 0, 7, 8, 20]})
    print(df)
     | user_id | value 
     _________________

0    |     1   |    0  
1    |     2   |  100  
2    |     2   |   50  
3    |     3   |    0  
4    |     3   |   25  
5    |     3   |   50  
6    |     4   |  100  
7    |     4   |    0  
8    |     4   |    7  
9    |     4   |    8  
10   |     5   |   20  

Also, I have a dictionary, which is

dict = {1 : 'type_a', 2: 'type_b', 3: 'type_a', 4: 'type_b', 5: 'type_a'}

My idea is to create a third column in my DataFrame df, which would be called tariff, so if I have a user_id 3 all rows in the DataFrame would have a tariff of type a.

I have found one solution, but i don't quite understand how it is implemented.

df['tariffs'] = df.apply(lambda x: dict[x.user_id], axis=1)
print(df)
     | user_id | value |
     _________________________

0    |     1   |    0  |type_a
1    |     2   |  100  |type_b
2    |     2   |   50  |type_b
3    |     3   |    0  |type_a
4    |     3   |   25  |type_a
5    |     3   |   50  |type_a
6    |     4   |  100  |type_b
7    |     4   |    0  |type_b
8    |     4   |    7  |type_b
9    |     4   |    8  |type_b
10   |     5   |   20  |type_a

The result i get after this line of code is exactly what I want

Especially I do not understand the part dict[x.user_id] The question is are there any alternatives to the method I used. And what is the logic behind dict[x.user_id]. Thanks in advance

vnikonov_63
  • 191
  • 12

1 Answers1

1

Is it clearer written like this :

df['tariffs'] = df.apply(lambda row: dict[row['user_id']], axis=1)

The lambda function is applied to each row (because axis = 1) of the dataframe, the result is concatenated and affected to the new column df['tariffs']

Kokli
  • 155
  • 7