I have the following description I want scrap using my program.
<hr>
Provides AFROTC cadets up to 13 options for practical leadership and specialized training
through exposure to USAF functions, deployments, and employment operations. Foreign language and cultural immersions also available/possible but overall emphasis remains on leadership development and practicum. All programs conducted off-site at selected Air Forces bases and other locations in the USA and abroad.<br>
I have the following code:
findDescription = re.findall('<hr>(.*?)(?:<strong>|<br>)', coursePage)
And I get the following output:
['Provides AFROTC cadets up to 13 options for practical leadership and specialized training through exposure to USAF functions, deployments, and employment operations.\xc2\xa0 Foreign language and cultural immersions also available/possible but overall emphasis remains on leadership development and practicum.\xc2\xa0 All programs conducted off-site at selected Air Forces bases and other locations in the USA and abroad.']
Why am I getting weird stuff like \xc2\xa0
in here? My code also gets tripped up with the quotation symbol "
. Frankly, I believe that the period .
in my regex code should accept all strings. What is going wrong?
I appreciate any quick hints. I only heard about regex on Friday and I have made tremendous progress, but this one has really tripped me up for a few hours.
Warm Regards, GeekyOmega