0

I did some web scraping using the code below. When i try to convert list to string it includes the escape sequences and ascii code. When i tried print it is printing properly, but I am trying to make a csv using the text hence want the escape sequence to be removed

import pandas as pd
from bs4 import BeautifulSoup
import requests
page_link = "https://www.undercurrentnews.com/2018/08/07/ecuador-tax-reform-to-spur-more-shrimp-investment/"
page_response = requests.get(page_link, timeout=5)
page_content = BeautifulSoup(page_response.content, "lxml")
textContent = []
for i in range(len(page_content.find_all("p"))):
    paragraphs = page_content.find_all("p")[i].text
    textContent.append(paragraphs)
raw_text = ' '.join(textContent)

'Don\'t have an account? Sign up Enter the email address associated with your account.\nWe\'ll send you instructions to reset your password

when i tried printing, it shows:

Don't have an account? Sign up Enter the email address associated with your account. We'll send you instructions to reset your password.

I want to remove the \n,\xa0 etc I tried some suggested decoding

decoded_string = bytes(raw_text, "utf-8").decode("unicode_escape")

still the same result. Since this is done for huge text I am unable to use the solutions. The text is fine when it is in list but when i convert it to string using join then these escape sequences are adding. Is there a way to add list itself to pandas ?

  • 2
    Could you provide the content of the list you're trying to format? – Vasilis G. Aug 13 '18 at 12:00
  • Possible duplicate of [How to remove all the escape sequences from a list of strings?](https://stackoverflow.com/questions/8115261/how-to-remove-all-the-escape-sequences-from-a-list-of-strings) Closely related. – Kaushik NP Aug 13 '18 at 12:16
  • I tried the solution and it does not work as the text and escape sequences are larger and does not come under the proposed list. I also looking for a solution which will avoid the translation from list ot string using join – Udhai kumar Aug 13 '18 at 12:43

0 Answers0