2

after scraping with the below python code my result contain some extra 'u'\n and some more kind of \xc2\xa0it ,how to get rid of this?(I tried using strip,still not worked)

    age_stored=BeautifulSoup(req_final_page.text,'html5lib')
    detail_content=page_stored.find('div', { 'class' : 'company-page-body body'})
    details=[]
    for content in detail_content:
        details.append(content.string)

Result is:

u'\n', u'What\xe2\x80\x99s different about great artists, designers, writers and entrepreneurs? What can they do that no-one else can? They see things other people don\xe2\x80\x99t. Things that don\u2019t exist yet. Better ways of doing things. Patterns and connections that other people missed. Milanote helps anyone to get that vision too, they believe that you\xe2\x80\x99ll be able to see things differently too.', u'\n', u'Milanote is based on the idea that behind every great piece of work is a lot of research, thinking and planning that is often messy, unstructured and takes time to evolve. That\u2019s why Milanote is much more visual, flexible and tactile than similar products. They\u2019ve\xc2\xa0really tried to reproduce the feeling of working on a wall in a creative studio.', u'\n'

pupu
  • 99
  • 1
  • 11

1 Answers1

1

It's because of unicode. You can see this question .

To get rid of u convert it to string before appending to the list.

details = []
for content in detail_content:
    details.append(str(content.string))
Community
  • 1
  • 1
MD. Khairul Basar
  • 4,976
  • 14
  • 41
  • 59