0

I would like to convert a list, that appears to be a list of dictionaries (and with other lists inside it) to a pandas dataframe.

Here is a sample of my data:

['b"{',
 'n  boxers: [',
 'n    {',
 'n      age: 30,',
 'n      hasBoutScheduled: true,',
 'n      id: 489762,',
 'n      last6: [Array],',
 "n      name: 'Andy Ruiz Jr',",
 'n      points: 754,',
 'n      rating: 100,',
 'n      record: [Object],',
 'n      residence: [Object],',
 "n      stance: 'orthodox'",
 'n    },',
 'n    {',
 'n      age: 34,',
 'n      hasBoutScheduled: true,',
 'n      id: 468841,',
 'n      last6: [Array],',
 "n      name: 'Deontay Wilder',",
 'n      points: 622,',
 'n      rating: 100,',
 'n      record: [Object],',
 'n      residence: [Object],',
 "n      stance: 'orthodox'",
 'n    },',
 'n    {',
 'n      age: 30,',
 'n      hasBoutScheduled: true,',
 'n      id: 659461,',
 'n      last6: [Array],',
 "n      name: 'Anthony Joshua',",
 'n      points: 603,',
 'n      rating: 100,',
 'n      record: [Object],',
 'n      residence: [Object],',
 "n      stance: 'orthodox'",
 'n    },'

This is what I have tried thus far:

pd.DataFrame.from_records(unclean_file)

This produces about 27 columns - presumably a column for every space break, comma etc.

I have also tried using ChainMap from collections import ChainMap

pd.DataFrame.from_dict(ChainMap(*unclean_file),orient='index',columns=['age','hasBoutScheduled','id','last6','name','points','rating','record','residence','stance'])

This produces the error message: ValueError: dictionary update sequence element #0 has length 1; 2 is required

Note: When I extracted the data I converted it to a list- to clarify I am using the naked package to run a node.js file that returns json output which I then save to the variable success, initially in bytes string format then converted to a list:

success = muterun_js('index.js')
unclean_file = [str(success.stdout).split('\\')]
Zephyrus
  • 366
  • 1
  • 10
Emm
  • 2,367
  • 3
  • 24
  • 50
  • your sample doesn't look like valid `json` format. also you probably don't want to split the content on backslashes - I'd suggest to not spilt at all and feed the string straight into `json.loads` – FObersteiner Nov 04 '19 at 12:19
  • @MrFuppes when I try that I get the message: JSONDecodeError: Expecting value: line 1 column 1 (char 0) – Emm Nov 04 '19 at 12:21
  • you could also try to use `literal_eval` from the [AST module](https://docs.python.org/3/library/ast.html) – FObersteiner Nov 04 '19 at 12:49
  • @MrFuppes please elaborate – Emm Nov 04 '19 at 13:16
  • Sorry, I'm sort of offline at the moment ;-) AST literal eval basically helps you to convert information stored in a string to python syntax and run it. That could e.g. allow you to create a list/dict. More info e.g. [here](https://stackoverflow.com/questions/15197673/using-pythons-eval-vs-ast-literal-eval) – FObersteiner Nov 04 '19 at 16:33

2 Answers2

0

You're reading in data in json format, so it would make more sense to use unclean_file = json.loads(success) instead of unclean_file = [str(success.stdout).split('\\')].

This should return you a dict object which you can directly insert into a DataFrame.

Furthermore you might need to decode your data.

import json
import pandas as pd

success= success.decode('utf-8') # decode your content. Might not be necessary. 
unclean_file = json.loads(success)
data = pd.DataFrame(unclean_file , index=[0])
Zephyrus
  • 366
  • 1
  • 10
  • this returns the error message: JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 3 (char 4) – Emm Nov 04 '19 at 11:25
  • Could you try `success= success.replace("\'", "\"")` before running `unclean_file = json.loads(success)`? – Zephyrus Nov 04 '19 at 12:09
  • 1
    I would still need to convert success to a string, success is a Naked object, when I call stdout, I get a byte string, this is why I use split to convert it to a list – Emm Nov 04 '19 at 12:11
  • `success= success.decode('utf-8')` should convert it to a string. – Zephyrus Nov 04 '19 at 12:14
  • success is a Naked object, running success.decode('utf-8') returns the error message 'NakedObject' object has no attribute 'decode' – Emm Nov 04 '19 at 12:18
  • Sorry, I didn't realize you are working with naked objects. I can't help you with those. – Zephyrus Nov 04 '19 at 12:38
0

Splitting the data string doesn't help - it makes it even harder to parse.

error message: JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 3 (char 4)

This clearly says that one problem are the unquoted keys; further problems are the unquoted values true, Array and Object. But it's not so hard to rectify all this:

unclean_string = success.stdout.decode()
import re
clean_string = re.sub(r'\w+(?=[],:])', r'"\g<0>"', unclean_string)

The above quotes all identifiers which are followed by :, , or ], and we get a well-formed dict representation, which we can evaluate and make a DataFrame of:

pd.DataFrame(eval(clean_string)['boxers'])
Armali
  • 18,255
  • 14
  • 57
  • 171