0

From a HTML source code, I want have the JSON file with regular expression .

here is my source code:

<div class="record-img"></div><span>沪公网安备 31010502000392号</span></a></div></div></div><script id="__NEXT_DATA__" type="application/json">{"dataManager":"[]","props":{"pageProps":{"error":"no_error","query":{"catId":"-11"},"list":[{"mallName":"双星八特澜蔓专卖店","merchantType":4,"goodsId":246071838673,"goodsSign":"JoD7R1E7y","goodsName":"双星男鞋夏季透气2021新款男士网面韩版百搭春季运动休闲跑步鞋子","goodsDesc":"双星男鞋夏季透气2021新款男士网面韩版百搭春季运动休闲跑步鞋子","goodsImageUrl":"https://t00img.yangkeduo.com/goods/images/2021-05-21/81cc46223c6e76e075a292acd1da3514.jpeg","goodsThumbnailUrl":"https://t00img.yangkeduo.com/goods/i............{"catId":"-11"},"buildId":"8L-Nn12bTfxvzblo_QyVL","dynamicBuildId":false,"runtimeConfig":{"youhuiHost":"//youhui.pinduoduo.com","loginHost":"//api.yangkeduo.com","goodsHost":"//api.yangkeduo.com","isDev":false}}</script><script async="" id="__NEXT_PAGE__/search/landing" src="/_next/static/8L-Nn12bTfxvzblo_QyVL/pages/search/landing.js"></script>

I want to have the json file starting with the word dataManager , how to write a regular expression for this ? I am using python.

1 Answers1

1

So, from the earlier question, I assume you have the r.text from which you want to retrieve the json part. You can do this in multiple ways, read why regex is not one of them.

I would do it in the following way:

from lxml import html
import json
tree = html.fromstring(r.text)
json_as_str = tree.xpath('//script[@id="__NEXT_DATA__"]/text()')[0]
json_as_dict = json.loads(json_as_str)
Ahsanul Haque
  • 10,676
  • 4
  • 41
  • 57