2
[[{"date":"January 2004"},true,false,100,null,null,true],[{"date":"February 2004"},false,false,99,null,null,true]]

I have a long string of data that came from a javascript file, like the above. Is there a short cut or library that would parse this into the appropriate data types?

As you can see, it's a list of lists that contain dictionaries, Boolean values, integers and null values.

I mean, I could do this by hand but I don't think I could do it very quickly or efficiently. There must be a better method.

User
  • 23,729
  • 38
  • 124
  • 207
  • Are there really no closing `}` brackets for those `{` brackets? – roippi May 23 '14 at 02:26
  • No, my mistake. I was trying to simplify the full version. I corrected it above. – User May 23 '14 at 02:34
  • Isn't this just [json](https://docs.python.org/2/library/json.html)? Not sure why it has `True`/`False` instead of `true`/`false` though. – DaoWen May 23 '14 at 02:41

2 Answers2

5

That's pretty close to valid JSON. The only invalid thing is that False should be false and True should be true. That could be a transcription error (...yep)


Use json:

import json

x = '[[{"date":"January 2004"},true,false,100,null,null,true],[{"date":"February 2004"},false,false,99,null,null,true]]'

json.loads(x)
Out[20]: 
[[{'date': 'January 2004'}, True, False, 100, None, None, True],
 [{'date': 'February 2004'}, False, False, 99, None, None, True]]
roippi
  • 25,533
  • 4
  • 48
  • 73
  • 1
    macdonjo says he's getting the output from JavaScript, so I wonder if he capitalized the Boolean values when he was posting the data... – DaoWen May 23 '14 at 02:43
  • Yes, you guys got me again, I capitalized them just because I was working in Python and it didn't cross my mind that it would make a difference. Yep, they're lowercase! :) – User May 23 '14 at 02:45
2

I suggest you take a look at PyParsing.

http://pyparsing.wikispaces.com/

You could also take a look at the Python "scanf" library.

sscanf in Python

If you needed to solve this problem just using Python built-ins, I would recommend using a regular expression with capture groups.

EDIT: Hmm, I took another look at this. You did say it was from JavaScript... this looks to me like a legal JSON array. I tried using the json module (specifically, the method function json.loads()) but I couldn't get it to parse.

But! Python syntax is close to JavaScript syntax. Replace a few things and eval() can parse this, or ast.literal_eval(). We need to replace true with True, false with False, and null with None before ast.literal_eval() will accept it.

import ast
s = '[[{"date":"January 2004"},True,False,100,null,null,true],[{"date":"February 2004"},False,False,99,null,null,true]]'
s1 = s.replace("true","True").replace("false","False").replace("null","None")
x = ast.literal_eval(s1)
print(x)

The above will print:

[[{'date': 'January 2004'}, True, False, 100, None, None, True], [{'date': 'February 2004'}, False, False, 99, None, None, True]]

Originally I showed defining variables (like true = True) and using eval() to parse this, but of course eval() is a potential security hole; so if you need to parse text that might come from a web page or any other untrusted source, it's worth the small amount of effort to import ast and use ast.literal_eval() instead.

EDIT: Okay, the json module can parse this; the problem was the use of True instead of true and False instead of false. Just use the str.replace() method function to fix those, and then json.loads() can parse this.

I was just about to post a code fragment with the .replace() method calls, when the question got updated again, and the capitalized True and False became ordinary legal JSON ones.

So my final answer:

s = '[[{"date":"January 2004"},true,false,100,null,null,true],[{"date":"February 2004"},false,false,99,null,null,true]]'

import json

x = json.loads(s)
print(x)

prints:

[[{u'date': u'January 2004'}, True, False, 100, None, None, True], [{u'date': u'February 2004'}, False, False, 99, None, None, True]]
Community
  • 1
  • 1
steveha
  • 74,789
  • 21
  • 92
  • 117
  • 1
    You didn't enter the eval argument as a string. Remember, it's a string. – User May 23 '14 at 02:48
  • 1
    @macdonjo Thanks for pointing that out. When I tested it, it worked, but when I typed it in here I failed to put the string quotes. Usually I copy/paste from my Python session so I'm putting the correct tested code, but I must not have done that this time; I wonder why not. – steveha May 23 '14 at 02:51
  • As for the source of the data, it's from a very popular website and it's not user entered. It's just a big database. So I guess this should be safe? – User May 23 '14 at 02:58
  • 1
    It's probably safe, but if you were going to do the "eval" trick I would suggest using `ast.literal_eval()` anyway. I'll modify my example to use that. But you might as well use `json.loads()` since it really is legal JSON. – steveha May 23 '14 at 03:14
  • Pyparsing is no longer hosted on wikispaces.com. Go to https://github.com/pyparsing/pyparsing – PaulMcG Aug 27 '18 at 13:16