0

I tried the following lines of code but I still get an error

if print(listing_description) != UnicodeEncodeError:
    print(listing_description)

Error message: if print(listing_description) != UnicodeEncodeError: UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-3: surrogates not allowed

Here's the webpage that I'm scraping from which is containing the non unicode characters:

https://www.autotrader.co.uk/classified/advert/202001146145497?postcode=po207nx&sort=distance&onesearchad=Used&onesearchad=Nearly%20New&onesearchad=New&advertising-location=at_cars&radius=1500&make=AUDI&model=A6%20SALOON&page=53

Those flag emojis in the listing description are the problem.

pvmlad
  • 87
  • 2
  • 8
  • 1
    How do you scrape the site? Show me more of your code please, so I can reproduce it. – Michael K Feb 03 '20 at 12:44
  • There is an ignore flag for decoding. Does this question help? https://stackoverflow.com/questions/24616678/unicodedecodeerror-in-python-when-reading-a-file-how-to-ignore-the-error-and-ju/24617071#24617071 – Neil Feb 03 '20 at 12:48
  • Maybe this helps: https://stackoverflow.com/questions/51217909/removing-all-emojis-from-text – Michael K Feb 03 '20 at 12:49
  • 2
    FYI, that is not how you catch an exception! Use 'try' and 'except UnicodeEncodeError' – Neil Feb 03 '20 at 12:50
  • Excellent, try print and except UnicodeEncodeError worked like a charm – pvmlad Feb 03 '20 at 12:53
  • check my post history @MichaelK most of my work on this script is on here already, its shoddy code but it works – pvmlad Feb 03 '20 at 12:55

0 Answers0