3

I am trying to accomplish a simple task: creating a new column in a Pandas dataframe based on the conditions of other columns. I have consulted other posts (e.g., this very popular one, but also others that took different approaches) but have been unsuccessful.

The problem I am having now is that only the last value defined in my function is returned in the new column

For example:

I have the following column:

x
1
2
3

I want to add a new column of labels thusly:

x     size
1     Small
2     Medium
3     Large

Here is the most recent attempt's code:

import pandas as pd
import numpy as np

df = pd.read_csv('blah.csv')

def size (row):
    if row['rQ7'] == 1:
        return 'Small'
    if row['rQ7'] == 2:
        return 'Medium'
    if row['rQ7'] == 3:
        return 'Large'
    return -99 

'''
I have also tried breaking this into 
else: 
    return -99 
but it doesn't work. '''

df['size'] = df.apply (lambda row: size (row), axis=1)

Now, while I do not get any errors, when I apply the function to the dataframe, it only returns the last value, i.e., -99:

x    size
1    -99
2    -99
3    -99

This is also true for other functions I have tried, and when I tried to use df.loc[], Python would not copy any of the values to the new column, although no errors were present.

I am confused and at a loss: to me, and based on the other examples I have tried, it appears the code should work.

Any help is greatly appreciated.

Machavity
  • 30,841
  • 27
  • 92
  • 100
n0ro
  • 477
  • 4
  • 11

2 Answers2

1

You can use numpy.select():

df['col']=np.select([df.x.eq(1),df.x.eq(2),df.x.eq(3)],['small','medium','large'],\
                                                                  'something')

you can replace 'something' with the value which should appear when the conditions are not met.

print(df)

   x    size
0  1   small
1  2  medium
2  3   large
anky
  • 74,114
  • 11
  • 41
  • 70
  • Thanks for your suggestion! I am still having the issue where only 'something' is returned. All the other conditions are ignored. I am using Spyder to program, but I tried it in a different editor and was able to replicate the problem. – n0ro Mar 04 '19 at 16:48
  • @n0ro shouldnot be the case if you have correctly demonstrated the issue in the question. check for datatypes etc.. :) – anky Mar 04 '19 at 16:53
  • The numbers were being treated as strings for some reason by Spyder! Thanks so much for the obvious solution! It works now xD – n0ro Mar 04 '19 at 16:58
  • 1
    @n0ro no problem. :) Happy coding. :) – anky Mar 04 '19 at 16:59
1

You can try more simple version:

import pandas as pd
import numpy as np

df = pd.read_csv('blah.csv')

def size(x):
    if x == 1:
        return 'Small'
    if x == 2:
        return 'Medium'
    if x == 3:
        return 'Large'
    return -99 

# maybe your row type is"string"
df['size'] = df['rQ7'].apply (lambda x: size(int(x))) 
Alex Glinsky
  • 3,346
  • 1
  • 17
  • 14