I have an extensive JS dictionary retrieved from an HTML webpage and I want to extract data from it without parsing the Javascript. Currently I am trying to accomplish this using Regular Expression.
The problem is that the dictionary is quite complex and dynamic, meaning that, on occasion, I could find some new keys inserted, yet I expect my target keys to stay the same.
This is highly trimmed data with some values ommited, but it maintains the complexity.
{"compactVideoRenderer":{"videoId":"abcDE123-_","thumbnail":{"thumbnails":[{"url":"OMMITED_URL","width":168,"height":94},{"url":"OMMITED_URL_TWO","width":336,"height":188}]},"title":{"accessibility":{"accessibilityData":{"label":"OMMITED_TITLE"}},"simpleText":"OMMITED_TITLE_SIMPLE"}}}
From the above, I need to extract the values of the following:
- compactVideoRenderer -> videoId ("abcDE123-_")
- compactVideoRenderer -> accessibility -> simpleText ("OMMITED_TITLE_SIMPLE")
The solution must be flexible enough that if I insert another key value pair at any location (as long as it does not change the 'address' of the target keys), the regex should still be able to find the target values.
As Regex is universal in terms of programming languages, code in any language will help, however, code or suggestions in Python are extra helpful!