3

I don't know why the index method has inconsistent behavior while doing column-wise apply function.

The data frame is:

df = pd.DataFrame( [(1, 'Hello'), (2, "World")])
df.columns=['A', 'B']

And I want to apply lambda to the second columns, it it saying the Series object can not be apply?

print df.iloc[:, 1:2].apply(lambda x: x.upper()).head()
 **AttributeError**:("'Series' object has no attribute 'upper'", u'occurred at index B')
print df.loc[:, ['B']].apply(lambda x: x.upper()).head()
 **AttributeError**:("'Series' object has no attribute 'upper'", u'occurred at index B')

But rather the following indexing method works well.

print df.loc[:, 'B'].apply(lambda x: x.upper()).head()

Why? I think the three index methods are equivalent? All above three indexing method has almostly the same result if print out that is:

   B
0  Hello
1  World

and print df.loc[:, 'B'] gets

0  Hello
1  World
Name: B, dtype: object

What do the differences mean?

Cœur
  • 37,241
  • 25
  • 195
  • 267
WeiChing 林煒清
  • 4,452
  • 3
  • 30
  • 65

2 Answers2

4

When you index with 'B' you get a series. When you index with 1:2 or with ['B'], you get a DataFrame with one column. When you use apply on a series, your function is called on each element. When you use apply on a DataFrame, your function is called on each column.

So no, they aren't equivalent. When you have a Series you can use your function as you want. When you have a one-column DataFrame, you can't, because it gets passed the column as its argument, and the column is a Series that doesn't have an upper method.

You can see that they aren't the same because the results are different when you print them out. Yes, they're almost the same, but not the same. The first one has a column header, indicating that it's a DataFrame; the second has no column header but has the "Name" at the bottom, indicating it's a Series.

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
1

As @BrenBarn mentioned, the difference is that in case of df.iloc[:, 1:2] you have DataFrame with one column, while in case of df.loc[:, 'B'] you have a Series. Just a little addition, to convert DataFrame with one column into series you can use pandas.squeeze() method:

>>> df.iloc[:, 1:2]
       B
0  Hello
1  World
>>> df.iloc[:, 1:2].squeeze()
0    Hello
1    World
Name: B, dtype: object

and then you can use apply (you don't have to use lambda, BTW):

>>> df.iloc[:, 1:2].squeeze().apply(str.upper)
0    HELLO
1    WORLD
Name: B, dtype: object

If you want to apply upper to DataFrame, you can use pandas.applymap():

>>> df.iloc[:, 1:2].applymap(str.upper)
       B
0  HELLO
1  WORLD
Roman Pekar
  • 107,110
  • 28
  • 195
  • 197