I'm writing a webscraper and one of the listings that I scraped has non unicode characters, how can I ignore these?

Question

I tried the following lines of code but I still get an error

if print(listing_description) != UnicodeEncodeError:
    print(listing_description)

Error message: if print(listing_description) != UnicodeEncodeError: UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-3: surrogates not allowed

Here's the webpage that I'm scraping from which is containing the non unicode characters:

https://www.autotrader.co.uk/classified/advert/202001146145497?postcode=po207nx&sort=distance&onesearchad=Used&onesearchad=Nearly%20New&onesearchad=New&advertising-location=at_cars&radius=1500&make=AUDI&model=A6%20SALOON&page=53

Those flag emojis in the listing description are the problem.

How do you scrape the site? Show me more of your code please, so I can reproduce it. — Michael K, Feb 03 '20 at 12:44
There is an ignore flag for decoding. Does this question help? https://stackoverflow.com/questions/24616678/unicodedecodeerror-in-python-when-reading-a-file-how-to-ignore-the-error-and-ju/24617071#24617071 — Neil, Feb 03 '20 at 12:48
Maybe this helps: https://stackoverflow.com/questions/51217909/removing-all-emojis-from-text — Michael K, Feb 03 '20 at 12:49
FYI, that is not how you catch an exception! Use 'try' and 'except UnicodeEncodeError' — Neil, Feb 03 '20 at 12:50
Excellent, try print and except UnicodeEncodeError worked like a charm — pvmlad, Feb 03 '20 at 12:53
check my post history @MichaelK most of my work on this script is on here already, its shoddy code but it works — pvmlad, Feb 03 '20 at 12:55

I'm writing a webscraper and one of the listings that I scraped has non unicode characters, how can I ignore these?

0 Answers0