webscraping with bs4 retrieving a text value

Question

import pandas as pd
import requests
from bs4 import BeautifulSoup
import re

url = 'https://www.betexplorer.com/odds-movements/soccer/'

res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
times = soup.select('span.table-main__time') #good
matches = soup.find_all("td",class_ ="table-main__tt")

odds = soup.find_all("td",class_ ="table-main__odds")

the desired target for the scrape is the value here after data-odd="value"

I have tried odds[0].a.text to no avail

is there any advise on how to extract these values using bs4?

`find_all("a", {"data-odd": True})` Reference: https://stackoverflow.com/a/39055066/1961688 — thethiny, Sep 26 '22 at 17:28
thanks. the end result is just to have the values after data-odds in increments of 3 columns horizontially. get_text wont work here. — Paul Corcoran, Sep 26 '22 at 17:36
Does this answer your question? [Retrieving the text output of a html website using bs4](https://stackoverflow.com/questions/73821496/retrieving-the-text-output-of-a-html-website-using-bs4) — HedgeHog, Sep 26 '22 at 18:32
Hi @HedgeHog, that worked for the text. But the result of the odds variable contains data not in a text format it seems. — Paul Corcoran, Sep 26 '22 at 18:39

score 1 · Answer 1 · answered Sep 26 '22 at 19:10

1

Check the docs for attributes and use tag.get('attr') if you’re not sure that attr is defined, just as you would with a Python dictionary.

Example

import requests
from bs4 import BeautifulSoup

url = 'https://www.betexplorer.com/odds-movements/soccer/'
soup = BeautifulSoup(requests.get(url).content)

data = []

for m in soup.select('#odds-movements tr:has(.table-main__tt)'):
    data.append({
        'match':m.a.text,
        'time':m.span.text,
        'odds':[o.get('data-odd') for o in m.select('a[data-odd]')],
        'adapt':'the concept to add additional scraped information'
        
    })
data

Output

[{'match': 'Monterrey W - U.N.A.M.- Pumas W',
  'time': '04:00',
  'odds': ['1.32', '5.10', '6.56', '6.56'],
  'adapt': 'the concept to add additional scraped information'},
 {'match': 'Santamarina - Gimnasia Mendoza',
  'time': '01:30',
  'odds': ['3.37', '3.37', '2.85', '2.25'],
  'adapt': 'the concept to add additional scraped information'},
 {'match': 'Club America W - Pachuca W',
  'time': '02:00',
  'odds': ['1.77', '3.85', '3.53', '3.53'],
  'adapt': 'the concept to add additional scraped information'},...]

answered Sep 26 '22 at 19:10

HedgeHog

22,146
4
14
36

thank you that is a nifty piece of code, – Paul Corcoran Sep 26 '22 at 21:58
url = 'https://www.betexplorer.com/odds-movements/soccer/' soup = BeautifulSoup(requests.get(url).content) best_odds = soup.find_all("td",class_ ="bestbet-odd") x = [m.get('data-odd') for m in best_odds] data = [] for m in soup.select('#odds-movements tr:has(.table-main__tt)'): data.append({ 'match':m.a.text, 'time':m.span.text, 'odds':[o.get('data-odd') for o in m.select('a[data-odd]')], 'bestOdds':[[i[0:4] for i in x]] }) – Paul Corcoran Sep 26 '22 at 21:58
the last thing i need is to append each value to the line corresponding to the match, ie the first value in the list should fall under the best odds for the first match and so on, at the moment i am just returning all the values into each segment. any help is greatly appreciated. – Paul Corcoran Sep 26 '22 at 21:59
Why not operating on the row as in my example,? What is the reason, to find all without any connection to a row? – HedgeHog Sep 27 '22 at 19:41

webscraping with bs4 retrieving a text value

1 Answers1

Example

Output