1

Banging my head here..

I am trying to parse the html source for the entire contents of javascript variable 'ListData' with regex which starts with the declaration var Listdata = and ends with };.

I found a solution which is similar:

Fetch data of variables inside script tag in Python or Content added from js

But I am unable to get it to match the entire regex.

Code:

# Need the ListData object
pat = re.compile('var ListData = (.*?);')

string = """QuickLaunchMenu == null) QuickLaunchMenu = $create(UI.AspMenu, 
null, null, null, $get('QuickLaunchMenu')); } ExecuteOrDelayUntilScriptLoaded(QuickLaunchMenu, 'Core.js');
var ListData = { "Row" : 
[{
"ID": "159",
"PermMask": "0x1b03cc312ef",
"FSObjType": "0",
"ContentType": "Item"
};
moretext;
moretext"""

#Returns NoneType instead of match object
print(type(pat.search(string)))

Not sure what is going wrong here. Any help would be appreaciated.

wonderstruck80
  • 348
  • 2
  • 13

1 Answers1

3

In your regex, (.*?); part matches any 0+ chars other than line break chars up to the first ;. If there is no ; on the line, you will have no match.

Basing on the fact your expected match ends with the first }; at the end of a line, you may use

'(?sm)var ListData = (.*?)};$'

Here,

  • (?sm) - enables re.S (it makes . match any char) and re.M (this makes $ match the end of a line, not just the whole string and makes ^ match the start of line positions) modes
  • var ListData =
  • (.*?) - Group 1: any 0+ chars, as few as possible, up to the first...
  • };$ - }; at the end of a line
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Python has a better way to represent multiline matches: `pat = re.compile(r"var ListData = (.*?})?", re.MULTILINE)`. In this case you also need `re.DOTALL` since the dot can match a newline, so `pat = re.compile(r"var ListData = (.*?})?", re.MULTILINE | re.DOTALL)` – Adam Smith Nov 16 '18 at 17:07
  • 1
    @AdamSmith Your `r"var ListData = (.*?})?"` does not require `re.M` flag since it has neither `$` or `^`. `re.S` is equal to `re.DOTALL`, and I prefer using inline variants, they are shorter and more universal. – Wiktor Stribiżew Nov 16 '18 at 17:11
  • They are shorter, and more universal (inasmuch as the syntax I'm proposing couldn't be less universal since it applies only to Python), but I feel like inline flags muddy the waters a bit in a language that already quickly devolves to gobbledygook. YMMV :) – Adam Smith Nov 16 '18 at 17:13
  • @AdamSmith I also prefer that way in Python because there are often cases like https://stackoverflow.com/questions/11958728 or https://stackoverflow.com/questions/42581 – Wiktor Stribiżew Nov 16 '18 at 17:16
  • Notably: defining it as an explicit keyword argument is unambiguous. `re.compile(somepattern, flags=someflags)` – Adam Smith Nov 16 '18 at 17:24
  • @AdamSmith Yeah, that is another advantage of [using Python's `re.compile`](https://stackoverflow.com/questions/452104/is-it-worth-using-pythons-re-compile). – Wiktor Stribiżew Nov 16 '18 at 17:25
  • I mean, or any of the other `re` functions. They all accept `flags` as a kwarg :) – Adam Smith Nov 16 '18 at 17:32
  • @AdamSmith As I said, there is often confusion with them, as [here](https://stackoverflow.com/questions/11958728) and [here](https://stackoverflow.com/questions/42581). I close such duplicate questions regularly. – Wiktor Stribiżew Nov 16 '18 at 17:33
  • I understand your reasons for preferring the inline syntax. I'm simply pointing out that there is a One True Way to specify regex flags in Python that has unambiguous and expected results every time. You don't have to agree, but I'm certainly not convinced it's wrong because some people on the internet get it wrong by not including the explicit `flags=` syntax. – Adam Smith Nov 16 '18 at 17:46
  • Similarly, I don't think Python was wrong to switch from `print "this"` to `print("this")` even though examples abounded of duplicate questions at the time (and even still!), and I don't think list multiplication is worse than explicitly writing each list just because people use it in inappropriate places, even though that mistake [even has a canonical dupe target](https://stackoverflow.com/questions/240178/list-of-lists-changes-reflected-across-sublists-unexpectedly) – Adam Smith Nov 16 '18 at 17:48