add a string prefix to each value in a string column using Pandas

Question

I would like to append a string to the start of each value in a said column of a pandas dataframe (elegantly). I already figured out how to kind-of do this and I am currently using:

df.ix[(df['col'] != False), 'col'] = 'str'+df[(df['col'] != False), 'col']

This seems one hell of an inelegant thing to do - do you know any other way (which maybe also adds the character to rows where that column is 0 or NaN)?

In case this is yet unclear, I would like to turn:

    col 
1     a
2     0

into:

       col 
1     stra
2     str0

What exactly are you asking? please write an explanation on what your code does/wish it did — Ryan Saxe, Nov 17 '13 at 01:01
I thought what the example code does was very clear to the average pandas user. I have added use case examples for your convenience. — TheChymera, Nov 17 '13 at 01:13
Your description is somewhat at odds with your code. What is up with the `!= False` business? Do you want to add `str` to every value or only some? — BrenBarn, Nov 17 '13 at 01:21
your example still a bit unclear, do your want something like `df['col'] = 'str' + df['col'].astype(str)`? — Roman Pekar, Nov 17 '13 at 01:58

score 435 · Accepted Answer · answered Nov 17 '13 at 05:00

435

df['col'] = 'str' + df['col'].astype(str)

Example:

>>> df = pd.DataFrame({'col':['a',0]})
>>> df
  col
0   a
1   0
>>> df['col'] = 'str' + df['col'].astype(str)
>>> df
    col
0  stra
1  str0

answered Nov 17 '13 at 05:00

Roman Pekar

107,110
28
195
197

2

thank you. if of interest, dataframe indexes also support such string manipulations. – tagoma Jul 10 '17 at 21:30
2

How do I do this if conditions must be met prior to concatenation? – acecabana Apr 17 '18 at 19:04
1

@tagoma, after 4 years, Yes : it also support the dataframe indexes. You can create a new column and append to the index value as : df['col'] = 'str'+df.index.astype(str) – MEdwin Nov 06 '18 at 12:49
1

"astype(str)" might ruin the encoding if you are trying to save to a file in the end. – Raein Hashemi May 24 '19 at 16:10
14

When I try this as well as any other approach I get a SettingWithCopyWarning. Is there a way to avoid it? – Madan Ivan Mar 31 '20 at 10:21
I'm trying to use this solution but I get UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype(' None – Luca Monno Dec 30 '21 at 09:34

Cleb · Answer 2 · 2019-11-21T13:40:23.297

30

As an alternative, you can also use an apply combined with format (or better with f-strings) which I find slightly more readable if one e.g. also wants to add a suffix or manipulate the element itself:

df = pd.DataFrame({'col':['a', 0]})

df['col'] = df['col'].apply(lambda x: "{}{}".format('str', x))

which also yields the desired output:

    col
0  stra
1  str0

If you are using Python 3.6+, you can also use f-strings:

df['col'] = df['col'].apply(lambda x: f"str{x}")

yielding the same output.

The f-string version is almost as fast as @RomanPekar's solution (python 3.6.4):

df = pd.DataFrame({'col':['a', 0]*200000})

%timeit df['col'].apply(lambda x: f"str{x}")
117 ms ± 451 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit 'str' + df['col'].astype(str)
112 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using format, however, is indeed far slower:

%timeit df['col'].apply(lambda x: "{}{}".format('str', x))
185 ms ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

edited Nov 21 '19 at 13:40

answered Apr 24 '18 at 07:03

Cleb

25,102
20
116
151

same result, but way slower ;-) – Philipp_Kats Sep 13 '18 at 02:01
1

@Philipp_Kats: I added some timings, thanks for the suggestion! It seems that f-strings are almost as fast; `format` indeed performs worse. How did you compare? – Cleb Sep 13 '18 at 06:34
oh nice! in my understanding `.apply` is always either as fast or slower than "direct" vectorized operations; even if they are not slower, I prefer to avoid them where possible. – Philipp_Kats Sep 13 '18 at 22:11
@Philipp_Kats: I agree, however, in this particular case I find it more readable when I also add a suffix, do something with `x` itself etc., but that's just a matter of taste... :) – Cleb Sep 13 '18 at 22:17

Boxtell · Answer 3 · 2021-12-20T14:57:32.843

23

You can use pandas.Series.map :

df['col'].map('str{}'.format)

In this example, it will apply the word str before all your values.

edited Dec 20 '21 at 14:57

answered Dec 06 '19 at 17:54

Boxtell

251
2
7

score 6 · Answer 4 · answered Mar 08 '19 at 12:09

If you load you table file with dtype=str
or convert column type to string df['a'] = df['a'].astype(str)
then you can use such approach:

df['a']= 'col' + df['a'].str[:]

This approach allows prepend, append, and subset string of df.
Works on Pandas v0.23.4, v0.24.1. Don't know about earlier versions.

score 4 · Answer 5 · answered Mar 15 '21 at 20:21

Contributing to prefixing columns while controlling NaNs for things like human readable values on csv export.

"_" + df['col1'].replace(np.nan,'').astype(str)

Example:

import sys
import platform
import pandas as pd
import numpy as np

print("python {}".format(platform.python_version(), sys.executable))
print("pandas {}".format(pd.__version__))
print("numpy {}".format(np.__version__))

df = pd.DataFrame({
    'col1':["1a","1b","1c",np.nan],
    'col2':["2a","2b",np.nan,"2d"], 
    'col3':[31,32,33,34],
    'col4':[np.nan,42,43,np.nan]})

df['col1_prefixed'] = "_" + df['col1'].replace(np.nan,'no value').astype(str)
df['col4_prefixed'] = "_" + df['col4'].replace(np.nan,'no value').astype(str)

print(df)

python 3.7.3
pandas 1.2.3
numpy 1.18.5
  col1 col2  col3  col4 col1_prefixed col4_prefixed
0   1a   2a    31   NaN           _1a     _no value
1   1b   2b    32  42.0           _1b         _42.0
2   1c  NaN    33  43.0           _1c         _43.0
3  NaN   2d    34   NaN     _no value     _no value

(Sorry for the verbosity, I found this Q while working on an unrelated column type issue and this is my reproduction code)

I find it bothersome that `pd.Series([None]).astype('str')[0] == 'None'`. Similarly with `np.nan`. The string "None" is truthy, yet `None` is not. This solution helps account for that +1 — Wassadamo, Nov 23 '21 at 00:38

Lukas · Answer 6 · 2020-04-04T19:05:05.737

3

Another solution with .loc:

df = pd.DataFrame({'col': ['a', 0]})
df.loc[df.index, 'col'] = 'string' + df['col'].astype(str)

This is not as quick as solutions above (>1ms per loop slower) but may be useful in case you need conditional change, like:

mask = (df['col'] == 0)
df.loc[mask, 'col'] = 'string' + df['col'].astype(str)

edited Apr 04 '20 at 19:05

answered Sep 10 '19 at 08:22

Lukas

2,034
19
27

add a string prefix to each value in a string column using Pandas

6 Answers6

Linked

Related