-4

I am trying to open and convert my HTML file into a CSV so I can use it as a dataframe.

import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'file:///C:/Users/jessi/OneDrive/Documents/posts.html'
response = request.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

print(soup)

got this error: InvalidSchema: No connection adapters were found for 'file://C://Users//jessi//OneDrive//Documents//posts.html'

mmm18
  • 1
  • 3

1 Answers1

0

There's no server to request here. You have a simple file. Just read it.

from bs4 import BeautifulSoup
import pandas as pd
filename = 'C:/Users/jessi/OneDrive/Documents/posts.html'
soup = BeautifulSoup(open(filename).read(), 'html.parser')
print(soup)
Tim Roberts
  • 48,973
  • 4
  • 21
  • 30
  • okay, I understand, but now I have this error: UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 17024: character maps to – mmm18 Apr 27 '23 at 00:36
  • That means your file is not a UTF-8 file. You need to know what character set it is in in order to read the file. WE can't tell you that without seeing the file. You can specify the character set in the `open` call. – Tim Roberts Apr 27 '23 at 00:52
  • Try `open(filename, "rb")` to see if the html parser can read the encoding. – tdelaney Apr 27 '23 at 00:54
  • @TimRoberts can I show you in Github what I am trying to open? Because I'm doing a project and I am a beginner – mmm18 Apr 27 '23 at 00:59
  • It would have been quicker just to post the link rather than ask permission. It can't be a typical Windows CP1252 file, because 0x8D is not defined there. – Tim Roberts Apr 27 '23 at 01:36