i am trying to parse youtube data with requests, the json data is stored in the script tag, almost every time, what my code does is remove the variable part till the "{" character and cut off the ending part so that i get a json string, which i convert into python object with json.dumps()
but in out 1 of 30 the json script has its "{", "}", "," and ":" in hex value, which json.dumps is unable to parse and i get the following error:
Expecting value: line 1 column 1 (char 0)
as the hexed json string is large i cant share but heres is a short chunk:
var ytInitialData = '\x7b\x22responseContext\x22:\x7b\x22serviceTrackingParams\x22:\x5b\x7b\x22service\x22:\x22GFEEDBACK\x22,\x22params\x22:\x5b\x7b\x22key\x22:\x22has_unlimited_entitlement\x22,\x22value\x22:\x22False\x22\x7d,\x7b\x22key\x22:\x22has_premium_lite_entitlement\x22,\x22value\x22:\x22False\x22\x7d,\x7b\x22key\x22:\x22logged_in\x22,\x22value\x22:\x220\x22\x7d,\x7b\x22key\x22:\x22country-type\x22,\x22value\x22:\x22B\x22\x7d,\x7b\x22key\x22:\x22e\x22,\x22value\x22:\x221714240,23804281,23882502,23918597,23934970,23946420,23966208,23983296,23998056,24001373,24002022,24002025,24004644,24007246,24034168,24036947,24077241,24080738,24120820,24135310,24140247,24161116,24162919,24164186,24166867,24169501,24181174,24187043,24187377,24197450,24211178,24217535,24219713,24241378,24248091,24255163,24255543,24255545,24262346,24263796,24267564,24268142,24276555,24279196,24283015,24283093,24288043,24288346,24288664,24290971,24292955,24298326,24299875,24390013,24390675,24391539,24392364,24392405,24392524,24394395,24396647,24402891,24404640,24406316,24406621,24410275,24415864,24415866,24416290,24416354,24418790,24419276,24421154,24422003,24424028,24427072,24428900,24431193,24431335,24431909,24433348,24434665,24436008,24436527,24438684,24439229,24441244,24590921,24591046,24612269,24613467,24613789,24614043,24615089,24615363,24615479,24615619,24615664,39322504,39322574\x22\x7d\x5d\x7d,\x7b\x22service\x22:\x22CSI\x22,\x22params\x22:\x5b\x7b\x22key\x22:\x22yt_ad\x22,\x22value\x22:\x221\x22\x7d,\x7b\x22key\x22:\x22c\x22,\x22value\x22:\x22MWEB\x22\x7d,\x7b\x22key\x22:\x22cver\x22,\x22value\x22:\x222.20221216.00.00\x22\x7d,\x7b\x22key\x22:\x22yt_li\x22,\x22value\x22:\x220\x22\x7d,\x7b\x22key\x22:\x22GetSearch_rid\x22,"
and usually i get this string which is parsable by json:
var ytInitialData = '{"responseContext":{"serviceTrackingParams":[{"service":"GUIDED_HELP","params":[{"key":"context","value":"yt_web_search"},{"key":"logged_in","value":"0"}]},{"service":"GFEEDBACK","params":[{"key":"has_unlimited_entitlement","value":"False"},{"key":"has_premium_lite_entitlement","value":"False"},{"key":"logged_in","value":"0"},{"key":"country-type","value":"B"},{"key":"e","value":"1714247,23804281,23882685,23918597,23934970,23946420,23966208,23983296,23986025,23998056,24001373,24002022,24002025,24004644,24007246,24034168,24036948,24077241,24080738,24120820,24135310,24140247,24161116,24162920,24164186,24166867,24169501,24181174,24186125,24187043,24187377,24197450,24211178,24217535,24219713,24241378,24248091,24248956,24255165,24255543,24255545,24262346,24263796,24267564,24268142,24270990,24279196,24283015,24283093,24283493,24288045,24288486,24288664,24290971,24291857,24292955,24296352,24298324,24390675,24391537,24392364,24392399,24392450,24396645,24397913,24398983,24402891,24404640,24406313,24406621,24406859,24406860,24414074,24415864,24415866,24416354,24416707,24419347,24421894,24424574,24425332,24425898,24428902,24432282,24433679,24434659,24434665,24436009,24441244,24441300,24590921,24591046,24612269,24613467,24613789,24614043,24615089,24615363,24615479,24615557,24615619,24615664,39322504,39322574"}]},{"service":"CSI","params":[{"key":"yt_ad","value":"1"},{"key":"c","value":"WEB"},{"key":"cver","value":"2.20221215.04.01"},{"key":"yt_li","value":"0"},{"key":"GetSearch_rid",
i have tried this:
data = open('err.txt', 'r', encoding='utf-8').read() # json string with hex values
print(type(data))
byte_s = bytes.fromhex(data)
data = byte_s.decode("utf-8")
print(data)
but it results in error:
ValueError: non-hexadecimal number found in fromhex() arg at position 4440'