2

I have a df with a column like this:

                       words
1                     ['me']
2                   ['they']
4         ['it', 'we', 'it']
5                         []
6         ['we', 'we', 'it']

I want it to look like this:

                     words
1                     'me'
2                   'they'
4               'it we it'
5                       ''          
6               'we we it'

I have tried both these options, but they both yield in a result identical to the original series.

def join_words(df):
    words_string = ''.join(df.words)
    return words_string

master_df['words_string'] = master_df.apply(join_words, axis=1)

and...

master_df['words_String'] = master_df.words.str.join(' ')

Both these result in the original df. What am I doing wrong?

Edit

Using master_df['words_string'] = master_df['words'].apply(' '.join), I got:

1                                     [ ' m e ' ]
2                                 [ ' t h e y ' ]
4             [ ' i t ' ,   ' w e ' ,   ' i t ' ]
5                                             [ ]
6             [ ' w e ' ,   ' w e ' ,   ' i t ' ]
wjandrea
  • 28,235
  • 9
  • 60
  • 81
connor449
  • 1,549
  • 2
  • 18
  • 49
  • 2
    ummm may be it is not an actual list? else `master_df.words.str.join(' ')` should work, check `ast.literal_eval` if they are just the string repr of a list , its better to include `df.head().to_dict()` in your question too – anky Feb 20 '20 at 19:32
  • 1
    `df['words'].apply(literal_eval).agg(' '.join)` if it's a list not a string – Umar.H Feb 20 '20 at 19:33
  • Does this answer your question? [Pandas DataFrame stored list as string: How to convert back to list?](https://stackoverflow.com/questions/23111990/pandas-dataframe-stored-list-as-string-how-to-convert-back-to-list) – AMC Feb 20 '20 at 19:48
  • Also: https://stackoverflow.com/questions/1894269/convert-string-representation-of-list-to-list. – AMC Feb 20 '20 at 19:49
  • Please provide a proper [mcve], especially since we discovered that the contents of your Series are **strings, not lists** as your post currently implies. The formatting in your post needs some editing, but I am unable to do so as we're lacking some accessible and easy to use examples of your data. – AMC Feb 20 '20 at 19:51

3 Answers3

4

Edit:

As your edit shows, it seems the rows are not actually lists but strings interpreted as lists. We can use eval to ensure the format is of type list so as to later perform the join. It seems your sample data is the following:

df = pd.DataFrame({'index':[0,1,2,3,4],
                   'words':["['me']","['they']","['it','we','it']","[]","['we','we','it']"]})

How about this? Using apply with a lambda function which uses ' '.join() for each row (list):

df['words'] = df['words'].apply(eval).apply(' '.join)
print(df)

Output:

   index     words
0      0        me
1      1      they
2      2  it we it
3      3          
4      4  we we it
Community
  • 1
  • 1
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
1

Generally I'd advise against eval. Here's another approach when the elements are string not list:

words.str.extractall("'(\w*)'").groupby(level=0)[0].agg(' '.join)

Output:

1          me
2        they
4    it we it
6    we we it
Name: 0, dtype: object
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

Another idea is using the DataFrame.explode (since version 0.25.0) and the groupby/aggregate methods.

import pandas as pd

# create a list of list of strings
values = [
    ['me'],
    ['they'],
    ['it', 'we', 'it'],
    [],
    ['we', 'we', 'it']
]

# convert to a data frame
df = pd.DataFrame({'words': values})

# explode the cells (with lists) into separate rows having the same index
df2 = df.explode('words')
df2

This creates a table in the long-format giving the following output:

  words
0    me
1  they
2    it
2    we
2    it
3   nan
4    we
4    we
4    it

Now the long-format needs to be joined / aggregated:

# make sure the dtype is string
df2['words'] = df2['words'].astype(str)

# group by the index aggregating all values to a single string
df2.groupby(level=0).agg(' '.join)

giving the output:

      words
0        me
1      they
2  it we it
3       nan
4  we we it
Matthias
  • 5,574
  • 8
  • 61
  • 121