0

I'm trying to store some HTML in postgres using urllib and SQLAlchemy but it seems like something is getting garbled in the process when the html is being inserted\retrieved

using: SQLAlchemy 1.2, Python 3.6, postgres 10, urllib

class ParksTxState(Base):
    __tablename__ = 'parks_tx_state'

    id = Column(Integer, primary_key=True)
    park_name = Column(Text)
    url = Column(Text)
    html = Column(Text)


engine = create_engine("postgresql://<user>:<pass>@localhost/<db>", echo=False)

Session = sessionmaker(bind=engine)
session = Session()

url = 'https://tpwd.texas.gov/state-parks/abilene'
html = request.urlopen(url).read()

print(html)
# b'<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml">\n<head>\n...
# so far so good...

newpark = ParksTxState()
newpark.html = html

print(newpark.html)
# b'<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml">\n<head>\n...
# so we're still good here before committing....

session.add(newpark)
session.commit()

print(newpark.html)
# \x3c21444f43545950452068746d6c3e0a3...
# and here is where the garbage comes in.

For some reason the HTML is being stored as a long string of characters.. \x3c21444f43545950452068746d6c3e0a3c68746d6c20786d6c6e733d22687474703a2f2f7777772e7...

I've even set echo=True and saw that the insert statement was correct.

What am I doing wrong?

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
dangel
  • 7,238
  • 7
  • 48
  • 74

1 Answers1

0

Ok, it looks like request.urlopen(url).read() is returning a bytes object (see Methods of File Objects.) This needs to be converted to a string with .decode('utf-8')

html = request.urlopen(url).read()
html_string = html.decode('utf-8')

also see Convert bytes to a string?

dangel
  • 7,238
  • 7
  • 48
  • 74