2

From a couple of other posts, a simple way to concatenate columns in a dataframe is to use the map command, as in the example below. The map function returns a series, so why can't just a regular series be used instead of map?

import pandas as pd
df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]},index=['m','n','o'])
df['x'] = df.a.map(str) + "_x"

    a   b   x

m   1   4   1_x
n   2   5   2_x
o   3   6   3_x

This also works even though I'm specifically creating a series.

df['y'] = pd.Series(df.a.map(str)) + "_y"

    a   b   x    y
m   1   4   1_x  1_y
n   2   5   2_x  2_y
o   3   6   3_x  3_y

This doesn't work, it gives a TypeEror

df['z'] = df['a'] + "_z"
TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'str'

This doesn't work either:

df['z'] = pd.Series(df['a']) + "_z"
TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'str'

I checked to see if map returns a different type of object under the hood, but it doesn't seem to:

type(pd.Series(df.a.map(str)))
pandas.core.series.Series

type(pd.Series(df['a']))
pandas.core.series.Series

I'm confused about what map is doing that makes this work and how whatever map does carries over into the subsequent string arithmetic.

Community
  • 1
  • 1
Sanjuro
  • 87
  • 1
  • 7

1 Answers1

1

map maps the input values against a corresponding value in the passed in type.

Normally the passed in type is a series, dict or a function, in your case it's calling the str ctor as a function and just concatenating this with '_x'.

However, as you've found out df['a'] + "_z" and pd.Series(df['a']) + "_z" won't work as there is no operand defined for those types (ndarray with str).

You could do it using:

In [8]:    
df['a'].astype(str) + '_z'

Out[8]:
m    1_z
n    2_z
o    3_z
Name: a, dtype: object

The thing to consider is that when you call df['a'].map(str) the dtype is actually changed to str:

In [13]:    
df['a'].map(str).dtype
​
Out[13]:
dtype('O')

So you can see why your first version worked as you essentially changed the dtype or the series so the above is the same as df['a'].astype(str)

EdChum
  • 376,765
  • 198
  • 813
  • 562