I'm trying to get properly the information in a file .csv, but code is scraping information more than five times. Normally I should have 31 reviews and, in the file, it shows me 301. I have tried to follow the answer to this question Data to .csv is repeating three times. I need three different scrapes exported to a csv file but I understood anything. And the answer for this question Python repeating CSV file, I tried to change my code taking into account that solution but it doesn't work. Also I tried to change the variable's name but it doesn't either. Could you tell me what is wrong and what I have to do to get information properly? i'm really novice in coding so please, if you can explain me line by line yours modifications, I will appreciate them!
with requests.Session() as s:
for offset in range(10,40):
url = f'https://www.tripadvisor.fr/Restaurant_Review-g187147-d947475-Reviews-or{offset}-Le_Bouclard-Paris_Ile_de_France.html'
r = s.get(url)
soup = bs(r.content, 'lxml')
reviews = soup.select('.reviewSelector')
ids = [review.get('data-reviewid') for review in reviews]
r = s.post(
'https://www.tripadvisor.fr/OverlayWidgetAjax?Mode=EXPANDED_HOTEL_REVIEWS_RESP&metaReferer=',
data = {'reviews': ','.join(ids), 'contextChoice': 'DETAIL'},
headers = {'referer': r.url}
)
soup = bs(r.content, 'lxml')
if not offset:
inf_rest_name = soup.select_one('.heading').text.replace("\n","").strip()
rest_eclf = soup.select_one('.header_links a').text.strip()
for review in reviews:
name_client = review.select_one('.info_text > div:first-child').text.strip()
date_rev_cl = review.select_one('.ratingDate')['title'].strip()
titre_rev_cl = review.select_one('.noQuotes').text.strip()
opinion_cl = review.select_one('.partial_entry').text.replace("\n","").strip()
row = [f"{inf_rest_name}", f"{rest_eclf}", f"{name_client}", f"{date_rev_cl}" , f"{titre_rev_cl}", f"{opinion_cl}"]
w.writerow(row)