-3

I'm having some issues and just want to know what it may be because I've tested properly some of my code. Why is this code not working ? This is the code displayed below.

search_for_term = re.findall(r'<td class="kx o_\d.*data-bookmaker', doc)

This is the output of searc_for_term variable:

['<td class="kx o_1 winner" data-bookmaker',
 '<td class="kx o_0" data-bookmaker',
 '<td class="kx o_2" data-bookmaker']

Now I'm trying to find if any string contains word "winner". Code is shown below.

winner_ids = np.where([re.findall('winner', item) for item in search_for_term])

And now is the code which confuses me :

if(not all(winner_ids)):
   print("no winner")
else:
   print("winner does exist")

The output I get is "no winner". Can somebody explain this to me. I would be more than greatful.

newnick988888
  • 107
  • 13
  • 2
    What exactly do you not understand? Why did you expect to get something else as output? – mkrieger1 Nov 26 '19 at 00:05
  • Will `'*winner*'` do what you want? – Han Wang Nov 26 '19 at 00:08
  • Well, should it display winner does exist because winner is located in winner_ids judging by the array ? – newnick988888 Nov 26 '19 at 00:16
  • Use `re.search()` – CAustin Nov 26 '19 at 00:17
  • What if you print `winner_ids` to find out if your assumption is correct? – mkrieger1 Nov 26 '19 at 00:19
  • Please include all relevant code and data. See: [mcve]. It would be particularly useful here since some of these design choices seem odd. Also, it looks like you're using RegEx to parse HTML. I'm guessing you haven't seen [this legendary answer](https://stackoverflow.com/a/1732454/11301900) yet. – AMC Nov 26 '19 at 00:23

2 Answers2

0

Don't do this: RegEx match open tags except XHTML self-contained tags

Use the beautiful soup.

Then you can simply use:

winner = doc.find('td.winner[data-bookmaker]')

And the condition becomes:

if winner:
    print("winner exists")
else:
    print("no winner")

Reply to comment: You extract a copy of the DOM and feed to this.

You already are extracting the DOM as HTML and parsing it via RE. You might as well simply feed it to bs4.

Dan D.
  • 73,243
  • 15
  • 104
  • 123
0

i think you need

winner_ids = np.where([re.findall('.*winner.*', item) for item in search_for_term])
Peter Moore
  • 1,632
  • 1
  • 17
  • 31