0

have a question, how can I properly get the value for field 'data' from the following JS file via Python? Tried do it like parsing a json, but for json.load it's in incorrect format. So will be thankful for a help.

return [
{
    'id'         : 1, 
    'category'   : 'html5', 
    'name'       : {
        'en' : 'XSS via formaction - requiring user interaction (1)',
        'ja' : 'formaction\u7d4c\u7531\u3067\u306eXSS - \u30e6\u30fc\u30b6\u306e\u4ecb\u5728\u304c\u5fc5\u8981',
        'ru' : 'ПаÑÑивный Ñкриптинг через formaction (1)',
        'cs' : 'XSS pomocí formaction - vyžaduje uživatelskou interakci (1)',
        'de' : '',
        'tr' : 'formaction ile XSS - kullanıcı etkileşimi gerektiren (1)',
        'zh' : '通过formaction属性进行XSS - 需è¦ç”¨æˆ·è¿›è¡Œäº¤äº’ (1)'
    },
    'data'       : '<form id="test"></form><button form="test" formaction="%js_uri_alert%">X</button>',
    'trigger'    : 'document.getElementsByTagName("button")
    'urls'    : ['http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#attr-fs-formaction'],
    'howtofix'   : {
        'en' : 'Don\'t allow users to submit markup containing "form" and "formaction" attributes or transform them to bogus attributes. Avoid "id" attributes for forms as well as submit buttons.',
    'ja' : '\u0022form\u0022\u3068\u0022formaction\u0022\u5c5e\u6027\u3092\u542b\u3080\u8981\u7d20\u3092\u30e6\u30fc\u30b6\u306b\u5165\u529b\u3055\u305b\u306a\u3044\u3001\u3042\u308b\u3044\u306f\u305d\u308c\u3089\u306e\u5c5e\u6027\u3092\u7121\u52b9\u306a\u5024\u306b\u5909\u63db\u3059\u308b\u3053\u3068\u3002\u0022id\u0022\u5c5e\u6027\u306fform\u3060\u3051\u3067\u306a\u304fsubmit\u30dc\u30bf\u30f3\u306b\u3064\u3044\u3066\u3082\u5bfe\u5fdc\u3059\u308b\u3053\u3068\u3002',
        'ru' : 'Ðе разрешайте пользовательÑкую разметку, Ñодержащую атрибуты form и formaction или неправильные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ñтих атрибутов. Избегайте атрибут id в формах, как и Ñлементы ввода submit типа.',
        'cs' : 'Nedovolte uživatelům vložit kód obsahující atributy "form" a "formaction" Äi jejich "zkomolené" tvary. VyhnÄ›te se atributu "id" u formulářů i u odesílacích tlaÄítek.',
        'de' : '',
        'tr' : 'Kullanıcıların "form" ve "formaction" markup\'larını yollamaları engellenmeli veya bu özellikler kullanışsız hale getirilmelidir. Submit butonlarında olduğu gibi formlarda "id" özelliklerinden kaçının.',
        'zh' : 'ä¸è¦è®©ç”¨æˆ·æäº¤åŒ…å« "form" å’Œ "formaction"å±žæ€§çš„æ ‡ç­¾.é¿å…在form中出现idå±žæ€§åŠæäº¤æŒ‰é’®.'
    },
    'browsers'   : {
        'firefox' : ['4.0', 'latest'],
        'opera': ['10.5', 'latest'], 
        'chrome': ['10.0', 'latest'],
        'safari' : ['4.0.4', 'latest'],
        'internet explorer' : ['10', 'latest (inside form element)']
    },
    'tags'       : ['xss', 'html5', 'opera', 'chrome', 'firefox', 'formaction', 'javascript', 'button'],
    'reporter'   : '.mario'
}
]

Thanks a lot.

Zuhan
  • 1
  • 1
  • 1
  • 1
    `but for json.load it's in incorrect format` and what error it shows? Have you tried to `json.replace('"', '\\"').replace("'", "\"")`-it (replace doublequotes with `\"` and singlequotes with doublequotes to match JSON requironments)? – ankhzet Jan 02 '16 at 03:36
  • Read it line by line and look for the string 'data' – hpaulj Jan 02 '16 at 03:49
  • The character encoding looks wrong – Tamas Hegedus Jan 02 '16 at 05:24

2 Answers2

0

You may look into JavaScript parsers, like slimit (working example here). Or, you can also extract the data key value with a regular expression:

import re

match = re.search(r"'data'\s+:\s+'(.*?)',", script, re.MULTILINE | re.DOTALL)
if match:
    print(match.group(1))
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
0

It has a syntax error; 'trigger': 'document.getElementsByTagName("button") should end with ',.

If that is fixed, and you remove return from the start, the data can be gotten like

import ast

struct = ast.literal_eval(your_string)
data = struct[0]['data']
Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99