0

I'd like to extract the overall key in the dict ratings column and add it as a separate column . This is what I've tried so far:

def try_literal_eval(e):
    try:
        return ast.literal_eval(e)
    except ValueError:
        return {'overall': 0}

res = pd.DataFrame(df['ratings'].apply(try_literal_eval).tolist())
output = pd.concat((df.drop('ratings', 1), res), axis=1)
output

df

customer_id    rating 
44224         {'overall': 5, 'description': 3}
55243         {'overall': 3, 'description': 2}

desired output_df

customer_id    overall_rating
44224          5
55243          3
user12625679
  • 676
  • 8
  • 23

2 Answers2

1

df['overall_rating'] = df['rating'].apply(lambda x: x.get('overall')) should give you the results

c = ['customer_id','rating'] 
d = [[44224,{'overall': 5, 'description': 3}],
[55243,{'overall': 3, 'description': 2}]]
import pandas as pd
df = pd.DataFrame(d,columns=c)
print (df)
df['overall_rating'] = df['rating'].apply(lambda x: x.get('overall'))
print (df)

Output of this is:

Original DataFrame:

   customer_id                            rating
0        44224  {'overall': 5, 'description': 3}
1        55243  {'overall': 3, 'description': 2}

Updated DataFrame:

   customer_id                            rating  overall_rating
0        44224  {'overall': 5, 'description': 3}               5
1        55243  {'overall': 3, 'description': 2}               3

Or you can give:

df['overall_rating'] = pd.DataFrame([x for x in df['rating']])['overall']

The output of this will also be the same:

c = ['customer_id','rating'] 
d = [[44224,{'overall': 5, 'description': 3}],
[55243,{'overall': 3, 'description': 2}]]
import pandas as pd
df = pd.DataFrame(d,columns=c)
print (df)
df['overall_rating'] = pd.DataFrame([x for x in df['rating']])['overall']
#df['overall_rating'] = df['rating'].apply(lambda x: x.get('overall'))
print (df)

Original DataFrame:

   customer_id                            rating
0        44224  {'overall': 5, 'description': 3}
1        55243  {'overall': 3, 'description': 2}

Updated DataFrame:

   customer_id                            rating  overall_rating
0        44224  {'overall': 5, 'description': 3}               5
1        55243  {'overall': 3, 'description': 2}               3

Example with dictionary having float value and dictionary without an entry for 'overall'

c = ['customer_id','rating'] 
d = [[44224,{'overall': 5, 'description': 3}],
[55243,{'overall': 3, 'description': 2}],
[11223,{'overall': 1.5, 'description': 2}],
[12345,{'description':3}]]
import pandas as pd
df = pd.DataFrame(d,columns=c)
print (df)
df['overall_rating'] = df['rating'].apply(lambda x: x.get('overall'))
print (df)

The output of this is:

Input DataFrame

   customer_id                              rating
0        44224    {'overall': 5, 'description': 3}
1        55243    {'overall': 3, 'description': 2}
2        11223  {'overall': 1.5, 'description': 2}
3        12345                  {'description': 3}

The Updated DataFrame is:

   customer_id                              rating  overall_rating
0        44224    {'overall': 5, 'description': 3}             5.0
1        55243    {'overall': 3, 'description': 2}             3.0
2        11223  {'overall': 1.5, 'description': 2}             1.5
3        12345                  {'description': 3}             NaN
Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33
0

Try:

def try_literal_eval(e):
    try:
        return ast.literal_eval(e).get('overall', 0)
    except ValueError:
        return 0

df[['customer_id']].assign(overall_rating=df['ratings'].apply(try_literal_eval))
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74