1

I web scraped some stock tickers off a website and the text inside the span tags has '\xa0AYTU\xa0' as an example. I'm trying to remove '\xa0' from either side of the ticker using replace('xa0',''). However, when I go to append the list after I replaced the characters it appends the list with '\xa0AYTU\xa0' no matter what..

Here is my for loop in question.

fu_tickers = []

for t in match_fu.find_all('span'):
    temp = str(t.text)
    temp2 = temp.replace('xa0','')
    fu_tickers.append(temp2)

print(fu_tickers)

When I insert print(temp2) inside the for loop I can see it properly removes the characters but for some reason will not append temp2 string to the fu_tickers list with the characters removed.

Current results = ['\xa0AYTU\xa0', '\xa0CETX\xa0', '\xa0CHFS\xa0']

Desired results = ['AYTU', 'CETX', 'CHFS']

prime90
  • 889
  • 2
  • 14
  • 26

1 Answers1

2

Add \ to 'xa0' in str.replace:

lst = ['\xa0AYTU\xa0', '\xa0CETX\xa0', '\xa0CHFS\xa0']
lst = [i.replace('\xa0', '') for i in lst]

print(lst)

Prints:

['AYTU', 'CETX', 'CHFS']

Or use str.strip:

lst = ['\xa0AYTU\xa0', '\xa0CETX\xa0', '\xa0CHFS\xa0']
lst = [i.strip() for i in lst]

print(lst)

Prints:

['AYTU', 'CETX', 'CHFS']
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • omg im an idiot, didn't even realize i was missing the \....What kind of for loop is that where you can place variables before the for? Looks alot cleaner, didn't know you could do that – prime90 Jun 20 '20 at 23:06
  • Probably worth explains escape sequences – juanpa.arrivillaga Jun 20 '20 at 23:07
  • 2
    @prime90 that's a list comprehension, not a for loop. It's a special construct for creating lists using mapping/filtering operations. Set and dict comprehensions also exist, and generator expressions that create generators that all work similarly – juanpa.arrivillaga Jun 20 '20 at 23:08
  • 1
    @prime90 That's called _list comprehension_. You can read about them [here](https://www.pythonforbeginners.com/basics/list-comprehensions-in-python) for example. Super useful. – Andrej Kesely Jun 20 '20 at 23:08