How to get a list from a list?

Question

I read a pandas dataframe df from .csv file. Each cell of the dataframe contains a string like the following

for i in df.index:
    for j in df.columns:

df[i][j]
      '[0.109, 0.1455, 0.0, 1.80e-48, 42.070, -14.582]'

I would like to have a list with the values as np.float. I tried

 df[i][j].split()
'[0.109,',
 '0.145,',
 '0.0,',
 '1.80e-48,',
 '42.070,',
 '-14.582]']

what about `l = [float(x.strip(' []')) for x in s.split(',')]` — Karn Kumar, Oct 04 '18 at 13:19

score 4 · Accepted Answer · answered Oct 04 '18 at 13:16

4

You can use ast.literal_eval to parse the string as a list of floats:

>>> import ast
>>> ast.literal_eval('[0.109, 0.1455, 0.0, 1.80e-48, 42.070, -14.582]')
[0.109, 0.1455, 0.0, 1.8e-48, 42.07, -14.582]
>>>

answered Oct 04 '18 at 13:16

blhsing

91,368
6
71
106

Guimoute · Answer 2 · 2018-10-04T13:26:52.560

2

Without exterior modules, it's pretty easy to do with a list comprehension:

A = df[i][j]                     '[0.109, 0.1455, 0.0, 1.80e-48, 42.070, -14.582]'
B = A.strip("[]").split(",")      ['0.109', ' 0.1455', ' 0.0', ' 1.80e-48', ' 42.070', ' -14.582']
C = [float(x) for x in B]         [0.109, 0.1455, 0.0, 1.8e-48, 42.07, -14.582]

So the one-liner would be:

My_list_of_floats = [float(x) for x in df[i][j].strip("[]").split(",")]

edited Oct 04 '18 at 13:26

answered Oct 04 '18 at 13:23

Guimoute

4,407
3
12
28

1

This has the added benefit to being easily modified in the case you run into slightly different formats such as a list of numbers enclosed with curly brackets or other small changes. It can also never be used to run malicious code. – Aaron Oct 04 '18 at 13:26
True, it's also easy to add support for other languages, for example if the values use a comma instead of a dot as decimal separator (add in a `.replace(",", ".")`) and semi-colons instead of commas to delimit values. – Guimoute Oct 04 '18 at 13:36

Ravi Patel · Answer 3 · 2018-10-04T13:17:44.427

0

You can use the python eval() function to convert the string into a python object, then turn into np.float objects:

map(np.float, eval(df[i][j]))

This makes the string into a python list first, then casts each item as a np.float.

Since np.float == float, you can skip the casting to np.float, and just do

eval(df[i][j])

edited Oct 04 '18 at 13:17

answered Oct 04 '18 at 13:16

Ravi Patel

346
2
8

2

You might want to include the [usual provisos](https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice). – Peter Wood Oct 04 '18 at 13:17

jpp · Answer 4 · 2018-10-04T23:41:13.047

You can use ast.literal_eval, and I recommend you avoid chained indexing. Instead, use pd.DataFrame.at for fast scalar access. Note also to iterate columns you don't need to access pd.DataFrame.columns:

from ast import literal_eval

for i in df.index:
    for j in df:
        print(literal_eval(df.at[i, j]))

If you need to apply this for an entire series, you can use pd.Series.map or a list comprehension:

df['col1'] = df['col1'].map(literal_eval)
df['col1'] = [literal_eval(i) for i in df['col1']]

If each list has the same number of items I strongly suggest you split into separate columns to permit vectorised functionality:

df = df.join(pd.DataFrame(df.pop('col1').map(literal_eval).values.tolist()))

Pandas is not designed to hold lists in series and for big data workflows you will likely face efficiency and memory issues with such a data structure.

How to get a list from a list?

4 Answers4