3

In Pandas is there a way to convert data to a list without using Eval?

I have the following df, I want to use dataframe operations against the list without first having to convert the list items to list using eval

#a b
#['4','2','6']
#['2','3','7']
#['1','2','8']

dff="""a
['4','2','6']
['2','3','7']
['1','2','8']"""


csvfile = io.StringIO("a b\n0 ['1','2','5'] ['2']\n1 ['2','3','4'] ['4']\n2 ['2','3','4'] []\n3 [] []\n")
df = pd.read_csv(csvfile, sep=' ')

As you can see the types are objects:

df.a[0]                                                                                                                                                                                         
# "['4','2','6']"

df.a.to_numpy()                                                                                                                                                                                 
# array(["['4','2','6']", "['2','3','7']", "['1','2','8']"], dtype=object)

How can i convert using pandas, these objects to lists rather than strings of lists?

I can iterate over the items and create the lists manually using eval:

eval(df.a[0])
['4', '2', '6']

But i'd like to have the dataframe have objects of lists, rather than strings of lists. Is that possible? To convert a string of lists to actual lists in the dataframe object?

oppressionslayer
  • 6,942
  • 2
  • 7
  • 24
  • `ast.literal_eval`? – Dani Mesejo Nov 20 '19 at 21:04
  • are you looking for something like `df.a.apply(eval)` ? – Horace Nov 20 '19 at 21:07
  • @horace I'll try that as well, if I can chain that would be nice. Having pandas recognize them as lists is this goal so I'm also going to try a few other approaches here as well. My goal is to have a pandas that recognizes the object as a list rather than a string that contains a list, if not I'll just have to use eval. Maybe I can chain an apply that might work – oppressionslayer Nov 20 '19 at 21:40

2 Answers2

1

Use ast.literal_eval with the converters parameter:

csvfile = io.StringIO("a b\n0 ['1','2','5'] ['2']\n1 ['2','3','4'] ['4']\n2 ['2','3','4'] []\n3 [] []\n")
df = pd.read_csv(csvfile, sep=' ', converters={'a' : ast.literal_eval, 'b' : ast.literal_eval })
print(df)

Output

           a    b
0  [1, 2, 5]  [2]
1  [2, 3, 4]  [4]
2  [2, 3, 4]   []
3         []   []

If you are concern about the dangers of eval, you can consider ast.literal_eval as a safe alternative.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
1

If you don't want to use any variance of eval, you may use str.strip and str.split on each column. However, I don't know how efficient of this approach comparing to using eval

df.a.str.strip('[]').str.split()

Out[363]:
0    ['1','2','5']
1    ['2','3','4']
2    ['2','3','4']
3               []
Name: a, dtype: object

df.a.str.strip('[]').str.split().map(type)

Out[364]:
0    <class 'list'>
1    <class 'list'>
2    <class 'list'>
3    <class 'list'>
Name: a, dtype: object

Note: This is just an alternative idea. If your strings have some blanks or some different chars, it becomes significantly difficult to handle. I still personally prefer using eval or ast.literal_eval.

Andy L.
  • 24,909
  • 4
  • 17
  • 29