I did some web scraping using the code below. When i try to convert list to string it includes the escape sequences and ascii code. When i tried print it is printing properly, but I am trying to make a csv using the text hence want the escape sequence to be removed
import pandas as pd
from bs4 import BeautifulSoup
import requests
page_link = "https://www.undercurrentnews.com/2018/08/07/ecuador-tax-reform-to-spur-more-shrimp-investment/"
page_response = requests.get(page_link, timeout=5)
page_content = BeautifulSoup(page_response.content, "lxml")
textContent = []
for i in range(len(page_content.find_all("p"))):
paragraphs = page_content.find_all("p")[i].text
textContent.append(paragraphs)
raw_text = ' '.join(textContent)
'Don\'t have an account? Sign up Enter the email address associated with your account.\nWe\'ll send you instructions to reset your password
when i tried printing, it shows:
Don't have an account? Sign up Enter the email address associated with your account. We'll send you instructions to reset your password.
I want to remove the \n,\xa0 etc I tried some suggested decoding
decoded_string = bytes(raw_text, "utf-8").decode("unicode_escape")
still the same result. Since this is done for huge text I am unable to use the solutions. The text is fine when it is in list but when i convert it to string using join then these escape sequences are adding. Is there a way to add list itself to pandas ?