0
for p in articles2:
    url = p.find('a')['href']
    title = p.find('h3').get_text().strip().encode("utf-8")
    print(title)

OUTPUT:

c3\xa9gie de d\xc3\xa9fense active et pr\xc3\xa9ventive\xc2\xbb'

b'Zoom sur la course effr\xc3\xa9n\xc3\xa9e pour trouver un vaccin'

b'On vous le dit'

b'\xc3\x89dition du jour (PDF)'

b'Son port est d\xc3\xa9sormais obligatoire : Le prix du masque plafonn\xc3\xa9'

b'Baisse de 20% des prix des produits agricoles' .....
fixatd
  • 1,394
  • 1
  • 11
  • 19
Helina
  • 43
  • 6
  • Kindly share sample input data and expected output in copy pastable format – Anshul May 23 '20 at 17:21
  • What do you want to accomplish? The output is UTF-8-encoded, and a `bytes` object. If you want to output strings. don't encode. – wastl May 23 '20 at 17:28
  • Those are utf-8 encoded byte strings which is the normal output of `.encode('utf-8')`. If I do `b'Zoom sur la course effr\xc3\xa9n\xc3\xa9e pour trouver un vaccin'.decode('utf-8')` I get `'Zoom sur la course effrénée pour trouver un vaccin'`. Encdoing to byte string is good for saving to a file or sending to the network but its not good for human viewing. – tdelaney May 23 '20 at 17:46

2 Answers2

0

Try a different encoding, it seems this characters are Latin-1.

You can find more encodings here

Petru Tanas
  • 1,087
  • 1
  • 12
  • 36
0

Use split() and join to translate the characters.

i.e "Zoom sur la course effr\xc3\xa9n\xc3\xa9e pour trouver un vaccin" will be 'Zoom sur la course effrénée pour trouver un vaccin' after join and split()

Then encode it to ascii ignoring errors 'ignore' and decode it to utf-8 this is in order to remove the special characters such as é

Should look like:

"".join(the_text_to_clean.strip()).encode('ascii', 'ignore').decode("utf-8")

How it applies in your code

for p in articles2:
   url = p.find('a')['href']
   title = p.find('h3').get_text()
   title = "".join(title.strip()).encode('ascii', 'ignore').decode("utf-8") #clean title
   print(title)
xaander1
  • 1,064
  • 2
  • 12
  • 40