2

I read a pandas dataframe df from .csv file. Each cell of the dataframe contains a string like the following

for i in df.index:
    for j in df.columns:

df[i][j]
      '[0.109, 0.1455, 0.0, 1.80e-48, 42.070, -14.582]'

I would like to have a list with the values as np.float. I tried

 df[i][j].split()
'[0.109,',
 '0.145,',
 '0.0,',
 '1.80e-48,',
 '42.070,',
 '-14.582]']
jpp
  • 159,742
  • 34
  • 281
  • 339
emax
  • 6,965
  • 19
  • 74
  • 141

4 Answers4

4

You can use ast.literal_eval to parse the string as a list of floats:

>>> import ast
>>> ast.literal_eval('[0.109, 0.1455, 0.0, 1.80e-48, 42.070, -14.582]')
[0.109, 0.1455, 0.0, 1.8e-48, 42.07, -14.582]
>>>
blhsing
  • 91,368
  • 6
  • 71
  • 106
2

Without exterior modules, it's pretty easy to do with a list comprehension:

A = df[i][j]                     '[0.109, 0.1455, 0.0, 1.80e-48, 42.070, -14.582]'
B = A.strip("[]").split(",")      ['0.109', ' 0.1455', ' 0.0', ' 1.80e-48', ' 42.070', ' -14.582']
C = [float(x) for x in B]         [0.109, 0.1455, 0.0, 1.8e-48, 42.07, -14.582]

So the one-liner would be:

My_list_of_floats = [float(x) for x in df[i][j].strip("[]").split(",")]
Guimoute
  • 4,407
  • 3
  • 12
  • 28
  • 1
    This has the added benefit to being easily modified in the case you run into slightly different formats such as a list of numbers enclosed with curly brackets or other small changes. It can also never be used to run malicious code. – Aaron Oct 04 '18 at 13:26
  • True, it's also easy to add support for other languages, for example if the values use a comma instead of a dot as decimal separator (add in a `.replace(",", ".")`) and semi-colons instead of commas to delimit values. – Guimoute Oct 04 '18 at 13:36
0

You can use the python eval() function to convert the string into a python object, then turn into np.float objects:

map(np.float, eval(df[i][j]))

This makes the string into a python list first, then casts each item as a np.float.

Since np.float == float, you can skip the casting to np.float, and just do

eval(df[i][j])

Ravi Patel
  • 346
  • 2
  • 8
  • 2
    You might want to include the [usual provisos](https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice). – Peter Wood Oct 04 '18 at 13:17
0

You can use ast.literal_eval, and I recommend you avoid chained indexing. Instead, use pd.DataFrame.at for fast scalar access. Note also to iterate columns you don't need to access pd.DataFrame.columns:

from ast import literal_eval

for i in df.index:
    for j in df:
        print(literal_eval(df.at[i, j]))

If you need to apply this for an entire series, you can use pd.Series.map or a list comprehension:

df['col1'] = df['col1'].map(literal_eval)
df['col1'] = [literal_eval(i) for i in df['col1']]

If each list has the same number of items I strongly suggest you split into separate columns to permit vectorised functionality:

df = df.join(pd.DataFrame(df.pop('col1').map(literal_eval).values.tolist()))

Pandas is not designed to hold lists in series and for big data workflows you will likely face efficiency and memory issues with such a data structure.

jpp
  • 159,742
  • 34
  • 281
  • 339