There are two ways you can go about this. One way is using selenium. It allows you to control a browser programmatically (most common browsers, like Firefox and Chrome, are supported). I am not familiar with it, and it might be overkill in many situations (I imagine the browser will incur some overhead), but it's good to know.
Another way is to do some more inspection to see what's going on when you click the "Read More" button. The "Network" tab in the developer tools (I am using Chrome, but I think Firefox also has the same thing) can help with that by showing you all the HTTP requests the browser is sending.
I find that when you click the "Read More" button, a POST
request is sent to https://www.mouthshut.com/review/CorporateResponse.ashx
with the following data:
type: review
reviewid: 2836986
corp: false
isvideo: false
fbmessage: I found this review of ICICI Lombard Auto Insurance pretty useful
catid: 925641018
prodimg: .jpg
twittermsg: I found this review of ICICI Lombard Auto Insurance pretty useful %23WriteShareWin
twitterlnk: https://www.mouthshut.com/review/ICICI-Lombard-Auto-Insurance-review-rmlrrturotn
catname: ICICI Lombard Auto Insurance
rating_str: 1/5
usession: 0
However, when I just sent a POST
request with those data, it didn't work. That usually means that there is something in the HTTP headers that matters. It is usually the cookie; I have confirmed that this is indeed the case. The solution is easy with the requests
package (which you should totally use anyway): Use requests.Session
.
Here is a proof of concept:
import requests
with requests.Session() as s:
s.get('https://www.mouthshut.com/product-reviews/ICICI-Lombard-Auto-Insurance-reviews-925641018')
print(s.post('https://www.mouthshut.com/review/CorporateResponse.ashx',
data = {'type': 'review', 'reviewid': '2836986', 'catid': '925641018', 'corp': 'false', 'catname': ''}
).text)
The result is some html containing what you are looking for. Enjoy souping!