Python/pandas beginner here.
I have a pandas series (column of a larger df), what looks like this:
0 ['0344010000122413']
1 ['0344010000132886']
2 ['0344010000021642']
3 ['0344010000010731', '0344010000010732', '0344...
4 ['0344010000025264']
Name: NUMPOINTS, Length: 271, dtype: object
The length of each NUMPPOINT = 16. The number of NUMPOINTS per row differs from 0 to ±100.
As you can see, the dtype of the series is an object. The goal is that I want to convert each row in this series into real lists and the numbers to integers, but this cannot happen because of the dtype and the ['
and ']
. The variable length per row makes it not possible to use certain functions.
I used df['NUMPOINTS'] = df.NUMPOINTS.apply(lambda x: x[2:-2].split(','))
but that only works for rows with 1 NUMPOINT.
I used the df['NUMPOINTS'].replace(regex=True,inplace=True,to_replace=r'\D',value=r'')
function, but this 'sticks' the objects together. For example, index 3 gives:
3 0344010000010731034401000001073203440100000107...
Then converting to integers gives an error.
I used the solutions in this question pandas - convert string into list of strings but did not do the job either. Am I missing something here?
EDIT: Trying https://stackoverflow.com/users/10035985/andrej-kesely updated answer gives me:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-374-5f4f43cc7fc1> in <module>()
1 from ast import literal_eval
2 df["NUMPOINTS"] = df["NUMPOINTS"].apply(
----> 3 lambda x: [
4 int(value) for value in (literal_eval(x) if isinstance(x, str) else x)
5 ]
2 frames
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-374-5f4f43cc7fc1> in <listcomp>(.0)
2 df["NUMPOINTS"] = df["NUMPOINTS"].apply(
3 lambda x: [
----> 4 int(value) for value in (literal_eval(x) if isinstance(x, str) else x)
5 ]
6 )
ValueError: invalid literal for int() with base 10: "0344010000010731'"