I am trying to accomplish a simple task: creating a new column in a Pandas dataframe based on the conditions of other columns. I have consulted other posts (e.g., this very popular one, but also others that took different approaches) but have been unsuccessful.
The problem I am having now is that only the last value defined in my function is returned in the new column
For example:
I have the following column:
x
1
2
3
I want to add a new column of labels thusly:
x size
1 Small
2 Medium
3 Large
Here is the most recent attempt's code:
import pandas as pd
import numpy as np
df = pd.read_csv('blah.csv')
def size (row):
if row['rQ7'] == 1:
return 'Small'
if row['rQ7'] == 2:
return 'Medium'
if row['rQ7'] == 3:
return 'Large'
return -99
'''
I have also tried breaking this into
else:
return -99
but it doesn't work. '''
df['size'] = df.apply (lambda row: size (row), axis=1)
Now, while I do not get any errors, when I apply the function to the dataframe, it only returns the last value, i.e., -99:
x size
1 -99
2 -99
3 -99
This is also true for other functions I have tried, and when I tried to use df.loc[]
, Python would not copy any of the values to the new column, although no errors were present.
I am confused and at a loss: to me, and based on the other examples I have tried, it appears the code should work.
Any help is greatly appreciated.