6

I imported a CSV using Pandas and one column was read in with string entries. Examining the entries for this Series (column), I see that they should actually be lists. For example:

df['A'] = pd.Series(['["entry11"]', '["entry21","entry22"]', '["entry31","entry32"]'])

I would like to extract the list elements from the strings. So far, I've tried the following chain:

df['A'] = df['A'].replace("'",'',regex=True).
                  replace('\[','',regex=True).
                  replace('\]','',regex=True).
                  str.split(",")

(all on one line, of course).

and this gives me back my desired list elements in one column.

  • ['"entry11"']
  • ['"entry21", "entry22"']
  • ['"entry31", "entry32"']

My question: Is there a more efficient way of doing this? This seems like a lot of strain for something that should be a little easier.

Chris
  • 149
  • 1
  • 1
  • 9

1 Answers1

8

You can "apply" the ast.literal_eval() to the series:

In [8]: from ast import literal_eval

In [9]: df['A'] = df['A'].apply(literal_eval)

In [10]: df
Out[10]: 
                    A
0           [entry11]
1  [entry21, entry22]
2  [entry31, entry32]

There is also map() and applymap() - here is a topic where the differences are discussed:

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Thanks! I knew `ast.literal_eval` would do it, but I don't know pandas so I didn't know you can map that easily over the series – Adam Smith Jan 30 '17 at 21:44
  • 2
    @AdamSmith generally, you want to avid `.apply` for performance reasons because it just wraps a for-loop in Python, although, in some cases you don't have a choice. – juanpa.arrivillaga Jan 30 '17 at 21:46
  • Thanks! That answers my question!! As a side question (if that's possible) what if some of the entries are pure strings rather than lists inside of strings, like "entry" rather than "[entry]'? – Chris Jan 30 '17 at 21:55
  • 2
    @Chris sure, if the `"entry"` is enclosed into quotes, then `literal_eval` would be able to safely evaluate it (the solution would work as is). If not, you might need a custom function where you can handle possible `ValueError`s thrown by `literal_eval()` and return, for example, the same unevaluated string. – alecxe Jan 30 '17 at 21:59