5

After after retrieving some data, some rows of my dataframe are composed by "string-like" lists, like this one:

       bar                foo
   0     'A'                1
   1     'B'                2
   2     'B'     "['2' ,'3']"
   3     'B'              '4'
   4     'C'     "['5' ,'3']"

How could I turn the foo column into int values and take the bigger value with python pandas?

Rodrigo Vargas
  • 273
  • 3
  • 17

3 Answers3

3

literal_eval

from ast import literal_eval

df.foo.map(literal_eval).explode().max(level=0)

0    1
1    2
2    3
3    4
4    5
Name: foo, dtype: object

However, if some elements are already non-string objects

from ast import literal_eval

def l_eval(x):
    try:
        return literal_eval(x)
    except ValueError as e:
        return x

df.foo.map(l_eval).explode().max(level=0)
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • 2
    Not quite - that gets them halfway, but they want the `max()` too! – ti7 Mar 22 '21 at 21:28
  • 1
    @ti7 thx, fixed – piRSquared Mar 22 '21 at 21:33
  • this method throws the following error message in my terminal: Traceback (most recent call last): File "../lib/python3.9/ast.py", line 50, in parse return compile(source, filename, mode, flags, File "", line 1 2; 1 ^ SyntaxError: invalid syntax Should it be a problem with the ast.py file? – Rodrigo Vargas Mar 22 '21 at 22:18
  • 2
    No, i don't think so. Almost certainly it is that one of the values you are passing isn't a valid python literal. – piRSquared Mar 22 '21 at 22:22
3

You could use literal_eval to transform the foo column and then if the element is a list you get the max

import pandas as pd
from ast import literal_eval

...

df.foo = df.foo.map(lambda x: literal_eval(str(x)))
"""
  bar     foo
0   A       1
1   B       2
2   B  [2, 3]
3   B       4
4   C  [5, 3]
"""

df.foo = df.foo.map(lambda x: max(x) if isinstance(x, list) else x)
"""
  bar foo
0   A   1
1   B   2
2   B   3
3   B   4
4   C   5
"""
def get_max(item):
    item = literal_eval(str(item))
    return max(item) if isinstance(item, list) else item

df.foo = df.foo.map(get_max)
Renan Lopes
  • 411
  • 4
  • 16
2

Warning. This is kinda manual;

df['foo']=df['foo'].str.strip("''")#Strip ''
df['foo']=np.where(df['foo'].str.contains('\['),df['foo'].str.findall('(?<=\"\[\')\d(?=\'\,)|(?<=\')\d(?=\'\]\")'),df['foo'])# Extract digits sandwitched between special characters
df['foo']=df['foo'].explode().max(level = 0)



  bar foo
0  'A'   1
1  'B'   2
2  'B'   3
3  'B'   4
4  'C'   5
Rodrigo Vargas
  • 273
  • 3
  • 17
wwnde
  • 26,119
  • 6
  • 18
  • 32
  • 1
    I had to dig a bit into the errors but finally worked. The final snippet worked this way: df['foo']=df['foo'].explode().max(level = 0)#Explode dataframe – Rodrigo Vargas Mar 22 '21 at 22:58
  • Thanks, silly mistake on my part. It cant be a tuple. Edited – wwnde Mar 22 '21 at 22:59