0

Python/pandas beginner here.

I have a pandas series (column of a larger df), what looks like this:

0                                   ['0344010000122413']
1                                   ['0344010000132886']
2                                   ['0344010000021642']
3      ['0344010000010731', '0344010000010732', '0344...
4                                   ['0344010000025264']
Name: NUMPOINTS, Length: 271, dtype: object

The length of each NUMPPOINT = 16. The number of NUMPOINTS per row differs from 0 to ±100.

As you can see, the dtype of the series is an object. The goal is that I want to convert each row in this series into real lists and the numbers to integers, but this cannot happen because of the dtype and the [' and ']. The variable length per row makes it not possible to use certain functions.

I used df['NUMPOINTS'] = df.NUMPOINTS.apply(lambda x: x[2:-2].split(',')) but that only works for rows with 1 NUMPOINT.

I used the df['NUMPOINTS'].replace(regex=True,inplace=True,to_replace=r'\D',value=r'') function, but this 'sticks' the objects together. For example, index 3 gives:

3      0344010000010731034401000001073203440100000107...

Then converting to integers gives an error.

I used the solutions in this question pandas - convert string into list of strings but did not do the job either. Am I missing something here?

EDIT: Trying https://stackoverflow.com/users/10035985/andrej-kesely updated answer gives me:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-374-5f4f43cc7fc1> in <module>()
      1 from ast import literal_eval
      2 df["NUMPOINTS"] = df["NUMPOINTS"].apply(
----> 3     lambda x: [
      4         int(value) for value in (literal_eval(x) if isinstance(x, str) else x)
      5     ]

2 frames
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-374-5f4f43cc7fc1> in <listcomp>(.0)
      2 df["NUMPOINTS"] = df["NUMPOINTS"].apply(
      3     lambda x: [
----> 4         int(value) for value in (literal_eval(x) if isinstance(x, str) else x)
      5     ]
      6 )

ValueError: invalid literal for int() with base 10: "0344010000010731'"
QB-science
  • 23
  • 3

2 Answers2

0

You can apply ast.literal_eval and then int() inside list comprehension:

from ast import literal_eval

df["NUMPOINTS"] = df["NUMPOINTS"].apply(
    lambda x: [int(value) for value in literal_eval(x)]
)
print(df)

Prints:

                            NUMPOINTS
0                   [344010000122413]
1                   [344010000132886]
2                   [344010000021642]
3  [344010000010731, 344010000010732]
4                   [344010000025264]

EDIT:

If you have strings/lists in your column:

df["NUMPOINTS"] = df["NUMPOINTS"].apply(
    lambda x: [
        int(value.strip("'")) for value in (literal_eval(x) if isinstance(x, str) else x)
    ]
)
print(df)
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0
import pandas as pd

You can also do with map() and pd.eval() method:

df['NUMPOINTS']=df['NUMPOINTS'].map(lambda x: [int(y.lstrip('0').rstrip("'")) for y in (pd.eval(x) if type(x).__name__=='str' else x)])

Now if you print df you will get:

                            NUMPOINTS
0                   [344010000122413]
1                   [344010000132886]
2                   [344010000021642]
3  [344010000010731, 344010000010732]
4                   [344010000025264]
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
  • This solution gives me this Syntax Error: ```File "", line 1 [0 344010000122413 ] ^ SyntaxError: invalid syntax ``` – QB-science May 27 '21 at 13:32
  • Thanks for your help, but the updated answer gives me ``` File "", line 1 [0 344010000122413 ] ^ SyntaxError: invalid syntax ``` – QB-science May 27 '21 at 13:38