1

What is the best possible way to convert my data to DataFrame?

    data = b'{"word": "Gondwana", "date": "2019-03-27 13:07:12.404732"}'
           b'{"word": "alalus", "date": "2019-03-27 13:07:12.909517"}'
           b'{"word": "Balto-Slavonic", "date": "2019-03-27 13:07:14.911308"}'
           b'{"word": "peculatation", "date": "2019-03-27 13:07:15.421915"}'

I tried this. Didn't seem to work.

d = pd.DataFrame(dict(data))
Jazz
  • 445
  • 2
  • 7
  • 22

2 Answers2

2

First decode values to utf-8 and convert to dictionaries in list comprehension by ast.literal_eval or json.loads:

data = [b'{"word": "Gondwana", "date": "2019-03-27 13:07:12.404732"}',
        b'{"word": "alalus", "date": "2019-03-27 13:07:12.909517"}',
        b'{"word": "Balto-Slavonic", "date": "2019-03-27 13:07:14.911308"}',
        b'{"word": "peculatation", "date": "2019-03-27 13:07:15.421915"}']

import ast   

df = pd.DataFrame([ast.literal_eval(x.decode("utf-8")) for x in data])
print (df)
                         date            word
0  2019-03-27 13:07:12.404732        Gondwana
1  2019-03-27 13:07:12.909517          alalus
2  2019-03-27 13:07:14.911308  Balto-Slavonic
3  2019-03-27 13:07:15.421915    peculatation

Alternative solution, should be faster in large data:

import json

df = pd.DataFrame([json.loads(x.decode("utf-8")) for x in data])
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

You can't just construct a dictionary with a string of bytes formatted like a python dict. You'll need to parse it somehow.

If you know that your byte string is always going to be a valid dict. You can try

dict(eval(b'{"word": "soning", "date": "2019-03-27 13:07:13.409948"}'))

and you should be ok. If you don't know what will be in the byte string I would advise against the use of eval.

The other answer here advises use of ast.literal_eval this is safer than eval because literal_eval cannot be used to evaluate complex expression. see: https://docs.python.org/3.5/library/ast.html#ast.literal_eval

you can get literal_eval from the ast module


from ast import literal_eval
literal_eval(b'{"word": "soning", "date": "2019-03-27 13:07:13.409948"}')

owen
  • 1
  • 2
  • [Why is using 'eval' a bad practice?](https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice) – jezrael Mar 28 '19 at 12:31