Remove some output content using Python 3

Question

I got an output like:

Review: This hotel has been here awhile. However, they’ve kept it up nicely. The staff is very professional and friendly. The rooms have everything you need. Only con, the ice machines are on the 2nd and 8th floors only. Nice place, great location.
      0
0  None
Review: Thank you for taking a moment to share your experience. I am pleased to hear you found the hotel and staff to your liking. We look forward to welcoming you back to our hotel in the not too distant future.Sincerely,Andrea McLeodAssistant Front Office ManagerHilton New York Grand CentralAndrea.McLeod@Hilton.com 212-986-8800
      0
0  None
Review: I have wasted lot of time looking for a laundry service as there is none in the hotel, the ones they have leaves at saturdays 9:30 am, and then you don't have any other option (neither assistance from the desk). The shower doors open alone, so everything gets wet and there is no place to put your soap or shampoo... This is like a 2 stars hotel :S I hope my company can book me another one for my following stays.
      0
0  None
Review: Dear Manuel AThank you for having chosen our hotel for your trip to New York. We apologize that the door did not close properly. If you do return to the hotel please let us know if any issues that you may have and we will be more than happy to fix them. Thank you again for choosing the Hilton Grand Central. 
      0

I want to delete the second and 4th paragraph starting from Review: Thank you for taking a moment to share your experience. I am pleased t.... and Review: Dear Manuel AThank you for having chosen our hotel for your trip to New York. ..

How can I remove this two paragraphs from my output using python3?

This is the updated version of the code which works fine.But how do I save the output in a dataframe format using panadas in CSV format?

for dtags in html.find_all('div', attrs={'class':'wrap'}):
        for index, ptags in enumerate(dtags.find_all('p', attrs={'class':'partial_entry'})):
            if index == 0: #match the first element
                x = ptags.text
                z = print('Review:', x)

Based on what do you want to remove the paragraphs? Remove every even paragraph? Always remove the second and fourth only? Remove based on the content? — interjay, Feb 12 '18 at 09:28

Sean Breckenridge · Accepted Answer · 2018-02-12T10:39:21.323

1

So it seems you're parsing this page from tripadvisor.

Instead of parsing through the output, it'd be better to more accurately select the <p> elements from the page. Each of these reviews (and then replies from the owners) are in a class called wrap, so we can find all of those divs, and then find the first match of the partial_entry class, instead of trying to figure out whether we're looking at a reply or a review after selecting all of them.

for dtags in html.find_all('div', attrs={'class':'wrap'}):
    for index, ptags in enumerate(dtags.find_all('p', attrs={'class':'partial_entry'})):
        if index == 0: #match the first element
            x = ptags.text
            z = print('Review:', x)

edited Feb 12 '18 at 10:39

answered Feb 12 '18 at 09:35

Sean Breckenridge

1,932
16
26

How to save the using dataframe as CSV? – Shaon Paul Feb 12 '18 at 10:03
Sorry, not quite sure I understand. Is this using pandas? – Sean Breckenridge Feb 12 '18 at 10:09
Im not familiar it with pandas myself, but as long as your dataframe is all setup, it doesnt seem to be too hard: https://stackoverflow.com/questions/16923281/pandas-writing-dataframe-to-csv-file – Sean Breckenridge Feb 12 '18 at 10:16
Is there any other way to create a dataframe and save it as a csv file? – Shaon Paul Feb 12 '18 at 10:19
how to create the dataframe? – Shaon Paul Feb 12 '18 at 10:24

score 0 · Answer 2 · answered Feb 12 '18 at 08:55

Just introduce a gap, i.e when it will run for second and fourth time it will skip print statement. Eg :

int count = 0
for dtags in html.find_all('div', attrs={'class':'prw_rup prw_reviews_text_summary_hsx'}):
        if count%2 != 0 :
            count += 1
            continue
        count += 1
        for ptags in dtags.find_all('p', attrs={'class':'partial_entry'}):
            x = ptags.text
            z = print('Review:',x)

How to save the using dataframe as CSV? – Shaon Paul Feb 12 '18 at 10:06 — Shaon Paul, Feb 12 '18 at 10:06

score 0 · Answer 3 · answered Feb 12 '18 at 09:28

0

from itertools import islice
def get_paragraph(para):
    for x in para.split("\n"):
        if not x.startswith(" ") and (not x.endswith("None")):
            yield x
data = islice(get_paragraph(para),0,None,2)
print(list(data))
>>>['Review: This hotel has been here awhile. However, they've kept it up nicely. The staff is very professional and friendly. The rooms have everything you need. Only con, the ice machines are on the 2nd and 8th floors only. Nice place, great location.', "Review: I have wasted lot of time looking for a laundry service as there is none in the hotel, the ones they have leaves at saturdays 9:30 am, and then you don't have any other option (neither assistance from the desk). The shower doors open alone, so everything gets wet and there is no place to put your soap or shampoo... This is like a 2 stars hotel :S I hope my company can book me another one for my following stays."]

answered Feb 12 '18 at 09:28

Veera Balla Deva

790
6
19

How to save the using dataframe as CSV? – Shaon Paul Feb 12 '18 at 10:06
https://stackoverflow.com/questions/16923281/pandas-writing-dataframe-to-csv-file – Veera Balla Deva Feb 12 '18 at 10:07
how to create the dataframe? – Shaon Paul Feb 12 '18 at 10:24
please refer this https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html – Veera Balla Deva Feb 12 '18 at 10:26
results = [] results.append(z) df = pd.DataFrame(results) print(df) df.to_csv('output_trip.csv') This code is not working fine – Shaon Paul Feb 12 '18 at 10:30

Remove some output content using Python 3

3 Answers3