1

I got an output like:

Review: This hotel has been here awhile. However, they’ve kept it up nicely. The staff is very professional and friendly. The rooms have everything you need. Only con, the ice machines are on the 2nd and 8th floors only. Nice place, great location.
      0
0  None
Review: Thank you for taking a moment to share your experience. I am pleased to hear you found the hotel and staff to your liking. We look forward to welcoming you back to our hotel in the not too distant future.Sincerely,Andrea McLeodAssistant Front Office ManagerHilton New York Grand CentralAndrea.McLeod@Hilton.com 212-986-8800
      0
0  None
Review: I have wasted lot of time looking for a laundry service as there is none in the hotel, the ones they have leaves at saturdays 9:30 am, and then you don't have any other option (neither assistance from the desk). The shower doors open alone, so everything gets wet and there is no place to put your soap or shampoo... This is like a 2 stars hotel :S I hope my company can book me another one for my following stays.
      0
0  None
Review: Dear Manuel AThank you for having chosen our hotel for your trip to New York. We apologize that the door did not close properly. If you do return to the hotel please let us know if any issues that you may have and we will be more than happy to fix them. Thank you again for choosing the Hilton Grand Central. 
      0

I want to delete the second and 4th paragraph starting from Review: Thank you for taking a moment to share your experience. I am pleased t.... and Review: Dear Manuel AThank you for having chosen our hotel for your trip to New York. ..

How can I remove this two paragraphs from my output using python3?

This is the updated version of the code which works fine.But how do I save the output in a dataframe format using panadas in CSV format?

for dtags in html.find_all('div', attrs={'class':'wrap'}):
        for index, ptags in enumerate(dtags.find_all('p', attrs={'class':'partial_entry'})):
            if index == 0: #match the first element
                x = ptags.text
                z = print('Review:', x)
Shaon Paul
  • 153
  • 1
  • 2
  • 14
  • Based on what do you want to remove the paragraphs? Remove every even paragraph? Always remove the second and fourth only? Remove based on the content? – interjay Feb 12 '18 at 09:28

3 Answers3

1

So it seems you're parsing this page from tripadvisor.

Instead of parsing through the output, it'd be better to more accurately select the <p> elements from the page. Each of these reviews (and then replies from the owners) are in a class called wrap, so we can find all of those divs, and then find the first match of the partial_entry class, instead of trying to figure out whether we're looking at a reply or a review after selecting all of them.

for dtags in html.find_all('div', attrs={'class':'wrap'}):
    for index, ptags in enumerate(dtags.find_all('p', attrs={'class':'partial_entry'})):
        if index == 0: #match the first element
            x = ptags.text
            z = print('Review:', x)
Sean Breckenridge
  • 1,932
  • 16
  • 26
0

Just introduce a gap, i.e when it will run for second and fourth time it will skip print statement. Eg :

int count = 0
for dtags in html.find_all('div', attrs={'class':'prw_rup prw_reviews_text_summary_hsx'}):
        if count%2 != 0 :
            count += 1
            continue
        count += 1
        for ptags in dtags.find_all('p', attrs={'class':'partial_entry'}):
            x = ptags.text
            z = print('Review:',x)
anurag0510
  • 763
  • 1
  • 8
  • 17
0
from itertools import islice
def get_paragraph(para):
    for x in para.split("\n"):
        if not x.startswith(" ") and (not x.endswith("None")):
            yield x
data = islice(get_paragraph(para),0,None,2)
print(list(data))
>>>['Review: This hotel has been here awhile. However, they've kept it up nicely. The staff is very professional and friendly. The rooms have everything you need. Only con, the ice machines are on the 2nd and 8th floors only. Nice place, great location.', "Review: I have wasted lot of time looking for a laundry service as there is none in the hotel, the ones they have leaves at saturdays 9:30 am, and then you don't have any other option (neither assistance from the desk). The shower doors open alone, so everything gets wet and there is no place to put your soap or shampoo... This is like a 2 stars hotel :S I hope my company can book me another one for my following stays."]
Veera Balla Deva
  • 790
  • 6
  • 19