How do I convert a list in a Pandas DF into a string?

Question

I have a pandas data frame. One of the columns contains a list. I want that column to be a single string.

For example my list ['one','two','three'] should simply be 'one, two, three'

df['col'] = df['col'].astype(str).apply(lambda x: ', '.join(df['col'].astype(str)))

gives me ['one, two, three],['four','five','six'] where the second list is from the next row. Needless to say with millions of rows this concatenation across rows is not only incorrect, it kills my memory.

score 40 · Accepted Answer · edited Jan 25 '22 at 23:04

40

You should certainly not convert to string before you transform the list. Try:

df['col'].apply(', '.join)

Also note that apply applies the function to the elements of the series, so using df['col'] in the lambda function is probably not what you want.

Or, there is a native .str.join method, but it is (surprisingly) a bit slower than apply.

edited Jan 25 '22 at 23:04

wjandrea

28,235
9
60
81

answered May 20 '16 at 13:22

IanS

15,771
9
60
84

1

what if items of the list are `int` types – Khalil Al Hooti Aug 23 '18 at 10:01
4

@KhalilAlHooti this should work: `df['col'].apply(lambda x: ', '.join(map(str, x)))` – IanS Aug 23 '18 at 10:25
what about if there is some nan data. `df2['well_array2'].apply(lambda x: ','.join(str(i) for i in x if pd.isnull(all(x))==False)) ` did not work. I am getting error float' object is not iterable – Khalil Al Hooti Aug 23 '18 at 12:16
neither this is working, getting the same error. `df2['well_array2'].apply(lambda x: ','.join(str(i) for i in x if type(x) == type([])))` – Khalil Al Hooti Aug 23 '18 at 12:24
@IanS the entire x is nan, there is no list at this row in well_array2 column. Thanks in advance. I do not want to filter out nan values! – Khalil Al Hooti Aug 23 '18 at 12:32
4

@KhalilAlHooti this should work: `df['new_col'] = df['col'].dropna().apply(lambda x: ', '.join(map(str, x)))` – IanS Aug 23 '18 at 12:33
2

This filters out null values, but reassigns them when you create the new column (because pandas does index-based assignment, missing values are assigned nan). – IanS Aug 23 '18 at 12:34
1

I'm not a Pandas expert, but it seems like the reason to prefer `.str.join()` is that it accounts for NaN and other objects automatically. For example use `pd.Series([np.nan, ['one', 'two', 'three']])` and compare the two methods. – wjandrea Jan 25 '22 at 23:06

score 13 · Answer 2 · answered May 20 '16 at 13:22

13

When you cast col to str with astype, you get a string representation of a python list, brackets and all. You do not need to do that, just apply join directly:

import pandas as pd

df = pd.DataFrame({
    'A': [['a', 'b', 'c'], ['A', 'B', 'C']]
    })

# Out[8]: 
#            A
# 0  [a, b, c]
# 1  [A, B, C]

df['Joined'] = df.A.apply(', '.join)

#            A   Joined
# 0  [a, b, c]  a, b, c
# 1  [A, B, C]  A, B, C

answered May 20 '16 at 13:22

hilberts_drinking_problem

11,322
3
22
51

When I tried this, it performed it at the letter level rather than the word. – Rusty Coder May 20 '16 at 13:27
1

And you don't have this issue with a lambda function? – IanS May 20 '16 at 13:28
1

That means that your column is a string rather than a list. You can convert it back using `ast.literal_eval`. – hilberts_drinking_problem May 20 '16 at 13:28
1

Although in this case, it would be faster to just do `df['col'].str.replace('[\[,\]]', '')` – IanS May 20 '16 at 13:30

Anton Protopopov · Answer 3 · 2017-12-07T12:12:56.753

9

You could convert your list to str with astype(str) and then remove ', [, ] characters. Using @Yakim example:

In [114]: df
Out[114]:
           A
0  [a, b, c]
1  [A, B, C]

In [115]: df.A.astype(str).str.replace('\[|\]|\'', '')
Out[115]:
0    a, b, c
1    A, B, C
Name: A, dtype: object

Timing

import pandas as pd
df = pd.DataFrame({'A': [['a', 'b', 'c'], ['A', 'B', 'C']]})
df = pd.concat([df]*1000)


In [2]: timeit df['A'].apply(', '.join)
292 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [3]: timeit df['A'].str.join(', ')
368 µs ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [4]: timeit df['A'].apply(lambda x: ', '.join(x))
505 µs ± 5.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: timeit df['A'].str.replace('\[|\]|\'', '')
2.43 ms ± 62.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

edited Dec 07 '17 at 12:12

answered May 20 '16 at 13:34

Anton Protopopov

30,354
12
88
93

Interesting approach. However, I timed your method, and it is slower than `apply` by a factor of 5. – IanS May 20 '16 at 13:43
@IanS how did you time it? I thought vectorized string operations as shown in this answer would be much faster than apply – alys Dec 07 '17 at 00:04
@Shoof I used the `%timeit` magic in IPython. I checked again and I find 3 times as slow on a column with 100 rows. Two possible explanations: 1) Regex replace operations, even vectorized, can be quite slow. 2) Apply is smart, e.g. if you apply a standard function such as `sum` it will be quite fast. I assume something similar may be going on with `join`. – IanS Dec 07 '17 at 10:03
1

@Shoof @IanS I edited answer to add timings. And add new method with `str.join` which is on 2nd place after `.apply(', '.join)` – Anton Protopopov Dec 07 '17 at 12:13
1

@IanS thanks a lot for the timed results! That's indeed a bit surprising as some books suggested otherwise. Great to see the comparisons! – alys Dec 16 '17 at 04:41
It's interesting that `str.join` is slower (but then, only by a bit). Note that operations on objects cannot really be vectorized since an `object` column is just a column of pointers (here, to lists). – IanS May 28 '20 at 07:30

score 2 · Answer 4 · answered Feb 13 '20 at 17:44

2

Pandas offers a method for this, Series.str.join.

answered Feb 13 '20 at 17:44

AMC

2,642
7
13
35

`Series.str.join(', ')` works well for me. As stated by the documentation, "The lists containing object(s) of types other than str will produce a NaN." For my use case, this was preferable to an error, which is what I got when using `Series.apply(', '.join)`. – Kyle Jul 07 '23 at 15:24

How do I convert a list in a Pandas DF into a string?

4 Answers4

Linked

Related