How to search for matched string then extract the string after it and a colon

Question

I'm new to Python and web scraping so I apology if the question is too basic!

I want to extract the "score" and "rate" (rating) from the following example BeautifulSoup object

import bs4
import re
text = '<html><body>{"count":1,"results":[{"score":"2-1","MatchId":{"number":"889349"},"name":"Match","rating":{"rate":9.0}}],"performance":{"comment":{}}}</body></html>'
page = bs4.BeautifulSoup(text, "lxml")
print type(page)

I have tried these but nothing showed up (just blank [])

tmp = page.find_all(text=re.compile("score:(.*)"));
print(tmp)

tmp = page.findAll("score");
print(tmp)

I found this similar question but it gave me error

tmp = page.findAll(text = lambda(x): x.lower.index('score') != -1)
print(tmp)

AttributeError: 'builtin_function_or_method' object has no attribute 'index'

What did I do wrong? Thanks in advance!

Try using `x.lower()` instead of x.lower. – Taranjeet Apr 29 '17 at 17:57 — Taranjeet, Apr 29 '17 at 17:57
Thanks for your time guys! – IloveCatRPython Apr 29 '17 at 18:53 — IloveCatRPython, Apr 29 '17 at 18:53

score 2 · Accepted Answer · answered Apr 29 '17 at 18:15

This is two thirds of the way to a turducken of protocols. You can use beautifulsoup to find the body text and decode that with json. Then you have some python dicts and lists to through.

>>> import json
>>> import bs4
>>> import re
>>> text = '<html><body>{"count":1,"results":[{"score":"2-1","MatchId":{"number":"889349"},"name":"Match","rating":{"rate":9.0}}],"performance":{"comment":{}}}</body></html>'
>>> page = bs4.BeautifulSoup(text, "lxml")
>>> 
>>> data = json.loads(page.find('body').text)
>>> for result in data["results"]:
...     print(result["score"], result["rating"]["rate"])
... 
2-1 9.0
>>>

Works like charm! Learnt something new today. Thank you @tdelaney! — IloveCatRPython, Apr 29 '17 at 18:53

How to search for matched string then extract the string after it and a colon

1 Answers1