0
def age_range(age):
    if  age <= 18:
        return 'Minors'
    elif age >= 19 & age < 63:
        return 'Adults'
    elif age >= 63 & age < 101:
        return 'Senior Citizen'
    else:
        return 'Age Unknown'

titanic_data_df["PassengerType"] = titanic_data_df[['Age']].apply(age_range, axis = 1)

titanic_data_df.head()

I get the following error when I try to add a new column to an existing dataframe (titanic_data_df):

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-466-741f5646101e> in <module>()
      1 #create a new df with just age and distinguish each passenger as minor, adult or senior citizen
----> 2 titanic_data_df["PassengerType"] =     titanic_data_df[['Age']].apply(age_range, axis = 1)
      3 
      4 titanic_data_df.head()

C:\Users\test\Anaconda2\envs\py27DAND\lib\site-packages\pandas\core\frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   4161                     if reduce is None:
   4162                         reduce = True
-> 4163                     return self._apply_standard(f, axis, reduce=reduce)
   4164             else:
   4165                 return self._apply_broadcast(f, axis)

C:\Users\test\Anaconda2\envs\py27DAND\lib\site-packages\pandas\core\frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)
   4257             try:
   4258                 for i, v in enumerate(series_gen):
  -> 4259                     results[i] = func(v)
   4260                     keys.append(v.name)
   4261             except Exception as e:

 <ipython-input-465-e62ccbeee80e> in age_range(age)
      1 def age_range(age):
----> 2     if  age <= 18:
      3         return 'Minors'
      4     elif age >= 19 & age < 63:
      5         return 'Adults'

C:\Users\test\Anaconda2\envs\py27DAND\lib\site-packages\pandas\core\generic.pyc in __nonzero__(self)
    915         raise ValueError("The truth value of a {0} is ambiguous. "
    916                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 917                          .format(self.__class__.__name__))
    918 
    919     __bool__ = __nonzero__

 ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', u'occurred at index 0')

From what I have read so far it has got something to do with my the if...else statement in the method above. I can't figure out what it is though. Any help is appreciated. Thank you.

  • 1
    Could you add an [mcve] (including traceback) to your question? It's hard to figure out what's happening if we can't reproduce the error. – MSeifert Mar 07 '17 at 01:00
  • Is this a pandas question? Question tags seem incomplete. – Andrea Reina Mar 07 '17 at 01:02
  • 1
    I don't know much about Pandas, but I do know about the bitwise operator `&` being different from the logical operator `and`, so there's a good chance that's what's causing the problem. Actually, never mind - that would create incorrect results, not an error. – TigerhawkT3 Mar 07 '17 at 01:02
  • @TigerhawkT3 The problem is that any `bool` call on a Series is a ValueError. That applies to `if` but also to `and`! I extensivly covered possible options how to operate on Series in http://stackoverflow.com/a/36922103/5393381. Maybe the solution can be found there. – MSeifert Mar 07 '17 at 01:05
  • @MSeifert. My apologies. I am new to programming. I have now added the full error message. Thank you. – R_and_Python_noob Mar 07 '17 at 01:53
  • @AndreaReina. Yes I am trying to solve this using Pandas. Pandas tag did not come to my mind at the time of posting for some reason. – R_and_Python_noob Mar 07 '17 at 01:55

2 Answers2

1

When you select a column as titanic_data_df[['Age']] (note the double square brackets), you are actually getting a DataFrame containing a single column back. In this case, the apply() function is passing a single element Series to the function age_range.

Try this instead:

titanic_data_df["PassengerType"] = titanic_data_df['Age'].apply(age_range)
foglerit
  • 7,792
  • 8
  • 44
  • 64
  • Thank you for the explanation. That makes sense. Also, it seems I can also use appylmap() if I want to keep using Dataframe instead of series. – R_and_Python_noob Mar 07 '17 at 05:14
0

Pandas cut function will make this much easier for you. First, I will construct a data frame to demonstrate the cut function.

titanic_data_df = pd.DataFrame(data=[[13, 'Male'], [14, 'Female'], [38, 'Female'], [72, 'Male'], [33, 'Female'], [80, 'Male'], [34, 'Male'], [15, 'Female'], [27, 'Female'],[23, 'Male'], [64, 'Female'], [38, 'Female'], [12, 'Male'], [32, 'Female'], [21, 'Male'], [66, 'Male'], [73, 'Female'], [22, 'Female']], columns=['Age', 'Sex'])
print(titanic_data_df)
     Age     Sex
0    13    Male
1    14  Female
2    38  Female
3    72    Male
4    33  Female
5    80    Male
6    34    Male
7    15  Female
8    27  Female
9    23    Male
10   64  Female
11   38  Female
12   12    Male
13   32  Female
14   21    Male
15   66    Male
16   73  Female
17   22  Female

Then, I simply apply the cut function:

bins = ['Minors', 'Adults', 'Senior Citizens']
titanic_data_df["PassengerType"] = pd.cut(titanic_data_df.Age, [0, 18, 63, 101], labels=bins)
print(titanic_data_df)
     Age     Sex       PassengerType
0    13    Male          Minors
1    14  Female          Minors
2    38  Female          Adults
3    72    Male  Senior Citizen
4    33  Female          Adults
5    80    Male  Senior Citizen
6    34    Male          Adults
7    15  Female          Minors
8    27  Female          Adults
9    23    Male          Adults
10   64  Female  Senior Citizen
11   38  Female          Adults
12   12    Male          Minors
13   32  Female          Adults
14   21    Male          Adults
15   66    Male  Senior Citizen
16   73  Female  Senior Citizen
17   22  Female          Adults
Joe T. Boka
  • 6,554
  • 6
  • 29
  • 48