1

I have a column where the values are saved as a dictionary and I used the code below to untangle the values into two separate columns, however, I am struggling with the rows that have Null values (See error msg below): df

product_id    product_ratings
2323          {"average_rating": 4.2, "number_of_ratings": 10}      
4433          {"average_rating": 4.3, "number_of_ratings": 31}
3454          {"average_rating": 4.5, "number_of_ratings": 23}
4552          {"average_rating": 4.1, "number_of_ratings": 13}
3422          None

desired_output_df

product_id   average_rating  number_of_ratings
2323         4.2             10
4433         4.3             31
3454         4.5             23
4552         4.1             13
3422         0               0 

My code:

import ast
import pandas as pd 
df = pd.read_csv('path')
df = df.fillna(0)

dict_df = pd.DataFrame([ast.literal_eval(i) for i in df.product_ratings.values])
df2 = df.drop('product_ratings',axis=1)
final_df = pd.concat([df2,dict_df],axis=1)
final_df

However I am getting the following error: ValueError: malformed node or string: 0

user12625679
  • 676
  • 8
  • 23

2 Answers2

2

Try to convert to dictionary, else return dictionary with default values:

def try_literal_eval(e):
    try:
        return ast.literal_eval(e)
    except ValueError:
        return {'average_rating': 0, 'number_of_ratings': 0}


res = pd.DataFrame(df['product_ratings'].apply(try_literal_eval).tolist())
output = pd.concat((df.drop('product_ratings', 1), res), axis=1)
print(output)

Output

   product_id  average_rating  number_of_ratings
0        2323             4.2                 10
1        4433             4.3                 31
2        3454             4.5                 23
3        4552             4.1                 13
4        3422             0.0                  0
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
0

Let's try:

df = pd.read_csv('path')


dict_df = pd.DataFrame([ast.literal_eval(i) if i !='None' or i is not None else 
                        dict()
                        for i in df.product_ratings.values]
                      ).fillna(0)

final_df = df.drop('product_ratings', axis=1).join(dict_df)

Output:

   product_id  average_rating  number_of_ratings
0        2323             4.2               10.0
1        4433             4.3               31.0
2        3454             4.5               23.0
3        4552             4.1               13.0
4        3422             0.0                0.0
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74