1

I am trying to take the output of the web scrap and put it in a 1 txt file but it gives me an error saying

'charmap' codec can't encode character '\u200a' in position 23130: character maps to <undefined>
  File "C:\Users\Web scrapper.py", line 12, in <module>
    f.write(y)
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pyperclip
x = input("Link you want to scrap from:")
url = x
page = urlopen(url)
html = page.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")
y = str(soup.get_text())
print(y)
with open('Dogs.txt', 'w') as f:
    f.write(y)
Real Swat
  • 39
  • 6

2 Answers2

1

Your file is opened with the charmap codec by default. You are trying to write a character to the file which the codec doesn't support - hence the error. To make sure this doesn't happen, open the file for writing with the same codec as you decoded the HTML content with. Like this:

from urllib.request import urlopen

x = input("Link you want to scrap from:")
page = urlopen(url)
html = page.read().decode("utf-8")
print(html)
with open('Dogs.txt', 'w', encoding="utf-8") as f:
    f.write(html)

Also, as @Code-Apprentice wrote, there's no need to use BeautifulSoup here.

TheEagle
  • 5,808
  • 3
  • 11
  • 39
-1

In this particular case, you can just do f.write(html). There is no need to use BeautifulSoup or for the variable y since you are just taking the entire webpage.

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268