0

Anyone please helps. Please point out where i am wrong when the extracted reviews are written into 3 separate columns in hotelreview.csv, how can i fix this in order to write them into 1 column? and how to add the heading name "review" for it based on the codes below. And I also want to add the new extracted data ("review" column) into the existing csv 'hotel_FortWorth.csv'. I just added the extracted information into a new csv, i don't know how to combine 2 files together or any other ways? the url can be repeated to match the reviews. Please! Thank you!

File 'hotel_FortWorth.csv' has 3 columns, for example:

           Name                         link
1    Omni Fort Worth Hotel     https://www.tripadvisor.com.au/Hotel_Review-g55857-d777199-Reviews-Omni_Fort_Worth_Hotel-Fort_Worth_Texas.html
2    Hilton Garden Hotel       https://www.tripadvisor.com.au/Hotel_Review-g55857-d2533205-Reviews-Hilton_Garden_Inn_Fort_Worth_Medical_Center-Fort_Worth_Texas.html
3......
...

I used the urls from existing csv to extract the reviews, the codes as shown:

import requests
from unidecode import unidecode
from bs4 import BeautifulSoup
import pandas as pd    

file = []
data = pd.read_csv('hotel_FortWorth.csv', header = None)
df = data[2]

for url in df[1:]:
    print(url)
    thepage = requests.get(url).text
    soup = BeautifulSoup(thepage, "html.parser")
    resultsoup = soup.find_all("p", {"class": "partial_entry"})
    file.extend(resultsoup)

    with open('hotelreview.csv', 'w', newline='') as fid:
    for review in file:
        review_list = review.get_text()
        fid.write(unidecode(review_list+'\n'))

Expected result:

    name          link         review
1   ...           ...         ...
2
....
Julie
  • 151
  • 1
  • 1
  • 8

1 Answers1

0

You can pandas to create the new CSV.

Ex:

import requests
from unidecode import unidecode
from bs4 import BeautifulSoup
import pandas as pd

data = pd.read_csv('hotel_FortWorth.csv')
review = []
for url in data["link"]:
    print(url)
    thepage = requests.get(url).text
    soup = BeautifulSoup(thepage, "html.parser")
    resultsoup = soup.find_all("p", {"class": "partial_entry"})
    review.append(unidecode(resultsoup))
data["review"] = review
data.to_csv('hotelreview.csv')
Rakesh
  • 81,458
  • 17
  • 76
  • 113
  • Hi. Thank you for your time. I got error to run this. review.append(unidecode(resultsoup)) File "E:\Python\venv\lib\site-packages\unidecode\__init__.py", line 48, in unidecode_expect_ascii bytestring = string.encode('ASCII') AttributeError: 'NoneType' object has no attribute 'encode' – Julie Jul 06 '18 at 12:32
  • Looks like `resultsoup` is empty. you might need to tune your find_all params or you can use an `if` condition to ignore None values – Rakesh Jul 06 '18 at 12:35
  • I got this error: pandas.errors.EmptyDataError: No columns to parse from file – Julie Jul 06 '18 at 13:21
  • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.errors.EmptyDataError.html – Rakesh Jul 06 '18 at 14:08
  • ah yes,header is included. Thanks – Julie Jul 09 '18 at 04:34
  • But I still got error: ValueError: Length of values does not match length of index. I understand that one link will provide plenty of reviews, that is why. But how can i still add a new review column in, and the url can be repeated to match the reviews??? Thank you. – Julie Jul 09 '18 at 04:59