-1

I am trying to find out when a game has been postponed and get the related team information or game number because I append the team abbreviation to a list. What currently happens is that it is only getting the items that are postponed, and skipping over the games that do not have a postponement. I think I need to change the soup.select line, or do something slightly different, but cannot figure it out.

The code does not throw any errors, but the list returned is [0,1,2,3]. However, if you open https://www.rotowire.com/baseball/daily-lineups.php, it should return [0,1,14,15] because those are the team elements with a game postponed.

from bs4 import BeautifulSoup
import requests

url = 'https://www.rotowire.com/baseball/daily-lineups.php'

r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

x = 0

gamesRemoved = []

for tag in soup.select(".lineup__main > div"):
    ppcheck = tag.text
    if "POSTPONED" in ppcheck:
        print(x)
        print('Postponement')
        first_team = x*2
        print(first_team)
        gamesRemoved.append(first_team)
        second_team = x*2+1
        gamesRemoved.append(second_team)
        x+=1
        
    else:
        x+=1
        continue
print(gamesRemoved)   
Shawn Schreier
  • 780
  • 2
  • 10
  • 20
  • I don't understand - when I click your link I see only two red boxes with "POSTPONED". What information are you trying to get? – Andrej Kesely Jul 16 '21 at 19:28
  • Thanks for looking at my question. This is part of a bigger project but I tried to minimize to make it easier for people to reproduce. I essentially scrape every team, and their lineup. For example, it scrapes ['MIN', 'DET', 'MIA', 'PHI', 'BOS', 'NYY', 'SD', 'WAS'].... I need it to return elements 0 and 1 because the first MIN vs DET game is postponed. Then, I need it to return 14 and 15 because the second MIN vs DET game is also postponed. This will then allow me to drop the MIN and DET from another list I have. I hope this makes sense - I'm trying to simplify it as best I can. – Shawn Schreier Jul 16 '21 at 19:31
  • Then in the case I think Ajax's answer is already ok :) – Andrej Kesely Jul 16 '21 at 19:33

1 Answers1

2

You can use BeautifulSoup.select and check if 'is-postponed' exists as a class name in the lineup box:

from bs4 import BeautifulSoup as soup
import requests
d = soup(requests.get('https://www.rotowire.com/baseball/daily-lineups.php').text, 'html.parser')
p = [j for i, a in enumerate(d.select('.lineup.is-mlb')) for j in [i*2, i*2+1] if 'is-postponed' in a['class']]

Output:

[0, 1, 14, 15]
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • This works, thank you! Do you have any recommendations on learning how to web scrape elements, particularly child elements. I can scrape tables okay, but when I need to really dive into the elements I seem to hit a wall. Any tutorials/links that could help would be great. – Shawn Schreier Jul 16 '21 at 19:37
  • @ShawnSchreier [This](https://stackoverflow.com/questions/6287529/how-to-find-children-of-nodes-using-beautifulsoup) SO link should help you learn more. In general, though, [css selectors](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors) are a very powerful way to specify a set of parent-child relationships for target elements. – Ajax1234 Jul 16 '21 at 19:41