1

I need to decode and load to a DB, a string which arrives in the following format:

"[{u'date': 1508760000000, u'value': 0}, {u'date': 1509364800000, u'value': Decimal('5.989999771118164')}, {u'date': 1509969600000, u'value': Decimal('5.989999771118164')}, {u'date': 1510574400000, u'value': Decimal('9.579999923706055')}]"

Currently, this is the code I use in order to turn it into a dataframe:

import pandas as pd
import json

#json for example:
my_json="""[{u'date': 1508760000000, u'value': 0}, {u'date': 1509364800000, u'value': Decimal('5.989999771118164')}, {u'date': 1509969600000, u'value': Decimal('5.989999771118164')}, {u'date': 1510574400000, u'value': Decimal('9.579999923706055')}]"""
my_json=my_json[1:-1]
my_json=my_json.replace("u'","'")
my_json=my_json.replace("'",'"')
my_json=my_json.replace('Decimal("','')
my_json=my_json.replace('")','')
my_json=my_json.replace(', {','~ {')
my_json_list=my_json.split('~')
my_dict_list=[json.loads(row) for row in my_json_list]
df=pd.DataFrame(my_dict_list)

Is there a shorter/more elegant way to perform this?

Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
Roee Anuar
  • 3,071
  • 1
  • 19
  • 33
  • 5
    The string you provided is not a valid `json` as it starts with a list. – Kaushik NP Dec 20 '17 at 08:34
  • 5
    Where did that string come from? It looks like it was copied from a `print` of a python `list` of `dict` (the `u` before each string meaning it is a unicode string) and is not actually valid JSON. – Galen Dec 20 '17 at 08:35
  • 2
    You can transform an string to a literal structure with `ast`, like this: `import ast; ast.literal_eval(string)` but in your case you have Decimal objects and it doesn't work but you can check this [answer](https://stackoverflow.com/a/18178379/3540693) where they solve your problem. – Pentux Dec 20 '17 at 08:40
  • 1
    @KaushikNP: Why would that make it invalid JSON? Would you claim that the string `"[1, 2, 3]"` is invalid JSON? (I agree that the string is not valid JSON, but there's no problem with it being JSON array at top level.) – Mark Dickinson Dec 20 '17 at 08:51
  • @MarkDickinson : As you mentioned, I meant it should be classified as a JSONArray. – Kaushik NP Dec 20 '17 at 09:01
  • Python v2 uses u as a prefix to indicate the string is Unicode, the literal without the u should make you nervous, as then the content from the db would have lost its encoding. Try print the strings and the u will disappear ;-) In Python v3 all strings are Unicode or explicit bytes, and only the latter are without an encoding. – Abhishek Parikh Dec 20 '17 at 09:01
  • Like @Galen already mentioned this is not a json but it's a `print` of a `list` of `dicts` which also includes `Decimal` object. and this is your actual problem: the answer of @Dark enables you to deconstruct the string and similar strings (including a `Decimal` object) but what you are actually doing with this is trying to parse a string as python code which leads to a whole universe of problems in production. I would recommend changing the format of the data exchange e.g. to json. – Johannes Reichard Dec 20 '17 at 09:10
  • If it is possible to get the input data in another format (actual JSON, for example), it would probably be best to go from there. – Galen Dec 20 '17 at 09:14
  • The string arrives from a client's DB (it is stored as text) – Roee Anuar Dec 20 '17 at 10:16

2 Answers2

6

Use the eval i.e

from decimal import *
df = pd.DataFrame(eval(my_json))

           date              value
0  1508760000000                  0
1  1509364800000  5.989999771118164
2  1509969600000  5.989999771118164
3  1510574400000  9.579999923706055
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
  • 2
    `from decimal import *` is clever ;) – jezrael Dec 20 '17 at 09:00
  • 1
    `from decimal import Decimal` would be enought, but this string needs to be imported as data the problem is with exporting the data by printing a list of dicts... – Johannes Reichard Dec 20 '17 at 09:04
  • @JohannesReichard I didn't get you. Since its a pandas question I think all OP is expected is to get the dataframe as an output. And since I saw only a partial string I imported everything to avoid future issues. – Bharath M Shetty Dec 20 '17 at 09:08
  • I was about to suggest `ast.literal_eval`, but I suppose the string does contain `Decimal`... – Galen Dec 20 '17 at 09:13
  • hehe, I just added another comment to the answer to add some more context. Your answer solves the problem of the question perfectly, but I think there is another problem: you should never use a print of python code which needs to be read with `eval` to as an data exchange format. If it's a one time thing, just create real variables for it. If it's code used in production never ever use eval without additional percaussions, change your data exchange format ^^. – Johannes Reichard Dec 20 '17 at 09:13
  • Thank you! This is indeed elegant. – Roee Anuar Dec 20 '17 at 10:19
0

u means unicode - but if all your characters are ascii based - like the english language you can just use the casting str(), here is the output of that :

str(u'date')
'date'

i think it will solve your problem .

ddor254
  • 1,570
  • 1
  • 12
  • 28