1

After running an SQL query I get a data frame with one column that looks like this:

d = {'values': ['2,3,2,2', '2,3,5', '2', '3,4,2,2,2']}
df = pd.DataFrame(data=d)

What I need to do is turn each of this strings into a list of numbers:

d = {'values': [[2, 3, 2, 2], [2, 3, 5], [2], [3, 4, 2, 2, 2]]}

Is there a pythonic way to do that?

4 Answers4

3

Here's another way using pd.eval and explode

Hat tip to Jezrael (for the eval thread) and cs95 (for the pd.Eval tip) in the comments, read :

Why using eval is bad practice

df['values_agg'] = df['values'].map(pd.eval)

print(df)

      values       values_agg
0    2,3,2,2     [2, 3, 2, 2]
1      2,3,5        [2, 3, 5]
2          2              [2]
3  3,4,2,2,2  [3, 4, 2, 2, 2]

or avoiding eval all together in a more verbose way.

df['values'].str.split(',').explode().astype(int).groupby(level=0).agg(list)
Umar.H
  • 22,559
  • 7
  • 39
  • 74
  • [Why is using 'eval' a bad practice?](https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice) ? – jezrael Jul 02 '20 at 08:33
  • Here is better use `astype(int)` after `explode` – jezrael Jul 02 '20 at 08:33
  • 1
    Change `eval` to `pd.eval` and you are good — you won't even need the typecast – cs95 Jul 02 '20 at 08:37
  • @jezrael wow, that is very informative but I think it's more in-tune for applications I could change it to `str.split` then `explode.astype(int)` I guess ? – Umar.H Jul 02 '20 at 08:38
  • 1
    yop, exactly need this way. – jezrael Jul 02 '20 at 08:39
  • 1
    no I mean `df['values'].map(pd.eval)`, you don't need the other stuff. The 3rd row will remain a scalar instead of a list, big deal. – cs95 Jul 02 '20 at 08:41
  • @cs95 gracias mes ami, wow that's amazing didn't know that. thanks for the tip! – Umar.H Jul 02 '20 at 08:41
2

Using pandas.Series.str.findall:

df["values"] = df["values"].str.findall("\d+")
d2 = df.to_dict("list")
print(d2)

Output:

{'values': [['2', '3', '2', '2'], 
            ['2', '3', '5'], 
            ['2'], 
            ['3', '4', '2', '2', '2']]}
Chris
  • 29,127
  • 3
  • 28
  • 51
2

Because need numbers after split comvert values to integers:

df['values'] = df['values'].apply(lambda x: [int(y) for y in x.split(',')])
print (df)
            values
0     [2, 3, 2, 2]
1        [2, 3, 5]
2              [2]
3  [3, 4, 2, 2, 2]


print (df.to_dict("l"))
{'values': [[2, 3, 2, 2], [2, 3, 5], [2], [3, 4, 2, 2, 2]]}

For strings repr of numbers solution is simplier, only use Series.str.split:

df['values'] = df['values'].str.split(',')
print (df.to_dict("l"))
{'values': [['2', '3', '2', '2'], ['2', '3', '5'], ['2'], ['3', '4', '2', '2', '2']]}
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

You could do with a list comprehension:

d = {'values': [list(map(int, x.split(','))) for x in df['values']]}

[out]

{'values': [[2, 3, 2, 2], [2, 3, 5], [2], [3, 4, 2, 2, 2]]}
Chris Adams
  • 18,389
  • 4
  • 22
  • 39