I want to parse the following javascript which I scraped from an HTML page:
var ibmdebug = false; //indicates whether or not to display flash debug window
if (qsParse.get("debug") == "true") {
ibmdebug = true;
}
var matchStatsConfig = {
courtId : "B",
matchId : "5126",
matchStatus : "C"
,
eventId : "MX",
roundId : "1",
dayMessage : "Day 5 Friday 7 July",
relatedContentTags : ['atpi200','wta316629','atpba79','wta316713'],
team1 : {
a : "atpi200",
a_name : "D. Inglot",
a_seed : "",
b : "wta316629",
b_name : "L. Robson",
b_seed : ""
},
team2 : {
a : "atpba79",
a_name : "A. Begemann",
a_seed : "",
b : "wta316713",
b_name : "N. Melichar",
b_seed : ""
}
}
Based on this thread I use the package slimit as follows where js.text
contains the javascript code as a string:
data = js.text
parser = Parser()
tree = parser.parse(data)
fields = {getattr(node.left, 'value', ''): getattr(node.right, 'value', '')
for node in nodevisitor.visit(tree)
if isinstance(node, ast.Assign)}
print(fields)
The output/content of fields
looks as follows
{
'ibmdebug': 'true',
'courtId': '"B"',
'matchId': '"5126"',
'matchStatus': '"C"',
'eventId': '"MX"',
'roundId': '"1"',
'dayMessage': '"Day 5 Friday 7 July"',
'relatedContentTags': '',
'team1': '',
'a': '"atpba79"',
'a_name': '"A. Begemann"',
'a_seed': '""',
'b': '"wta316713"',
'b_name': '"N. Melichar"',
'b_seed': '""',
'team2': ''
}
As you can see, it is not parsed correctly (only parts are correct). The array relatedContentTags
remains empty, as do the team1
and team2
objects/dictionaries. Interestingly, the content of the team2
variable is there. I assume this is the case because the content of team1
is also parsed, but overwritten by the content of team2
.
My question is: How can I properly parse the initial javascript into a python data structure (e.g. dictionary)?