3

I am trying to record the DataCamp courses I have done by using a web scraper. First kudos to this guy, who has built something along my needs.

However, recently DataCamp has made changes to their website and now the comprehensive course data is not in JSON anymore, but seems to be stored as a string representation of a nested list.

If you take a look at the source of one of the chapter pages, the first element in the body is:

<body><script>window.PRELOADED_STATE = "[&quot;~#iM&quot;,[&quot;preFetchedData&quot;,[&quot;^0&quot;,[&quot;course&quot;,[&quot;^0&quot;,[&quot;status&quot;,&quot;SUCCESS&quot;,&quot;data&quot;,[&quot;^ &quot;,&quot;id&quot;,58,&quot;title&quot;,&quot;Introduction to R ...

So the original scraper was able to rely on JSON and extracting the information via the dict keys. There is an idea field, so probably I should be able to extract the data once I have a list of lists of the underlying data.

I tried extracting the string representation via ast.literal_eval, but that did not work. Any idea how I could make this list usable?

Ajax1234
  • 69,937
  • 8
  • 61
  • 102
mor3dr3ad
  • 131
  • 2
  • 11

0 Answers0