2

I am working on scraping a betting website for odds as my first web-scraping project. I have successfully scraped what I want so far and now have an array like this

[<b>+5\xbd\xa0-110</b>, <b>-5\xbd\xa0-110</b>]
[<b>+6\xa0-115</b>, <b>-6\xa0-105</b>]
[<b>+6\xa0-115</b>, <b>-6\xa0-105</b>]

Is there a way I can just pull out the -105/110/115? The numbers I am looking for are those 3 to the left of the </b> and I also need to include the positive or negative sign to the left of the three numbers. Do I need to use a regular expression? Thanks a lot!

Weston

weston6142
  • 181
  • 14
  • [Advice on parsing HTML with a regex](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Mangohero1 Oct 18 '17 at 21:22

1 Answers1

0

regex will work depending on if this is the only format the numbers are in.

Also, do you know if the positive sign is shown or it only shows negative?

If it does show positive...

([+-][\d]{3})<\/b>

If it doesn't show positive use...

([+-]?[\d]{3})<\/b>

http://regexr.com/3h08d

You should be able to extract the contents inside the round brackets.

Edit: you probably want to do something like below. This code will get each string from the list and then do a regex search on the string. It will append the result to the nums list. The result will be a 3 digit number with the sign in front, since it extracts the first group inside the round brackets.

import re

nums = []

for line in odds:

    result = re.search(('[+-][\d]{3})<\/b>',line)

    nums.append(result.group(1)))

print (nums)
Samantha
  • 751
  • 2
  • 6
  • 18
  • It will show positive or negative depending on the particular odds so either is possible. Also, I am using beautiful soup as you probably expected and am very new to python so I'm still having a little trouble comprehending this. If this array was called "odds" how would you exactly incorporate the code above you wrote? Lastly, would I have to write some more code to delete the ? I really appreciate your help! – weston6142 Oct 18 '17 at 21:36
  • Refer to my edit. I don't know what beautiful soup is but I hope this code works. You don't need to add more code to delete the , because in regex you can extract the contents inside round brackets. – Samantha Oct 18 '17 at 21:45
  • I haven't got it quite yet I will mess with it later tonight and see if I can get it to work. Thanks for the help!! – weston6142 Oct 18 '17 at 22:09
  • let me know how it goes :) – Samantha Oct 18 '17 at 22:10