-1

In my dataframe I have a column named 'items'(list like string) like below

which is of type string. I want to convert this into a list . I have tried various methods like ast.literal_eval ,eval etc..

import ast
import json
import numpy as np
null= None
d['items']=d['items'].apply(lambda x:ast.literal_eval(str(x)) if(np.all(pd.notnull(x))) else x )

By using ast.literal_eval method I am getting this error -ValueError: malformed node or string: <_ast.Name object at 0x0000019AE0EA4B80>

By using eval() method I get this error - File "", line unknown SyntaxError: unexpected EOF while parsing

d['items']=d['items'].apply(lambda x:ast.literal_eval(str(x)) if(np.all(pd.notnull(x))) else x )

I have data with 139358 rows. In this data there are chances of having null, None or any other elements that could have caused this EOF error. I am assuming I had properly handled the nans using the if condition -('if(np.all(pd.notnull(x)))').

Please suggest any exception handling to do if that can help or let me know any solution that can convert these list-like-strings to lists.

Finally the datatype of records in items should be list but not strings (string to list conversion) Thanks in advance!

Answer: My data had Invalid JSON objects. So written an if statement to filter out the invalid JSONS

pixel
  • 67
  • 1
  • 10

3 Answers3

1

Use the built-in json.loads() function:

import json
import numpy as np
d['items']=d['items'].apply(lambda x:json.loads(x) if(np.all(pd.notnull(x))) else x)
  • Yeah That gave me this error:JSONDecodeError: Expecting value: line 1 column 1 (char 0) – pixel Jul 17 '21 at 17:11
  • Sorry, I didn't read the full question. Let me take a look... –  Jul 17 '21 at 17:14
  • I can't reproduce the error. I'm creating the dataframe like this: `d=pd.DataFrame({'items':['...the string you posted exactly...']}, index=[0])` –  Jul 17 '21 at 17:18
  • Yeah true I think there is something underlying in the data that is conflicting the operation of json.loads..Any idea of tracing it ? – pixel Jul 17 '21 at 17:22
  • If could send your whole Python file that'd be great. –  Jul 17 '21 at 17:24
  • JSONDecodeError: Unterminated string starting at: line 651 column 27 (char 22037) I get this error for another dataset. This dataset has around 239,789 records. – pixel Jul 18 '21 at 12:57
  • 1
    Holy cow, that's huge. It's probably some control character or binary character or something like that, maybe `\u200b` or `\u200c`? Those seem to be common sometimes. Probably viewing that file with `less` could show you what the problem is. At any rate, it will always fail, no matter what method you use to parse the json, if the json is invalid. –  Jul 18 '21 at 13:22
  • How to trace which data(index number) caused the issue/error? Can you please let me know what is viewing the file with less. – pixel Jul 18 '21 at 13:24
  • @coder See [this fantastic answer](https://stackoverflow.com/a/159537). –  Jul 18 '21 at 13:32
  • 1
    You are right @hcaertnit , My data had invalid JSONs ... I corrected the code by filtering out invalid JSON objects and push only those which had valid JSON objects. – pixel Jul 19 '21 at 06:03
  • 1
    Great! Glad to here it's working. Upvoted your answer as it's helpful too ;) –  Jul 19 '21 at 12:14
1

Please try the below. Note sure if this will be helpful. This is Simple and I tried out with the input. Check and let me know.

`for i in df['items']:
   if isinstance(i, str):
      x=eval(i)
      df['items'][i]=x`
dinesh kumar
  • 95
  • 1
  • 4
  • I get the same EOF error when I run your script. Following is the error I got ,:8: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy d['items'][i]=x Traceback (most recent call last): exec(code_obj, self.user_global_ns, self.user_ns) File "", line unknown SyntaxError: unexpected EOF while parsing – pixel Jul 18 '21 at 10:46
  • 1
    As you are running in Jupyter notebook. Try to do from start. Try fresh once. – dinesh kumar Jul 18 '21 at 11:26
1

Make sure if the data is a perfect JSON or not

pixel
  • 67
  • 1
  • 10