2

How do I scrape text with no specific class? I have pulled up a past eBay listing that sold via auction. Here is the snippet of code from the heading section.

<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about   </span>2018 Panini Contenders Josh Allen #105 No Feet RC Ticket Auto PSA 10 GEM

I want to be able to scrape just the text "2018 Panini Contenders Josh Allen #105 No Feet RC Ticket Auto PSA 10 GEM" with requests and beautiful soup, but there is no class assigned to that specific text.

Here is the code I have so far...

my work

Currently working on this line.

h1 = soup.find('h1', id="itemTitle")
    print(h1)

Any help would be appreciated.

JonShish
  • 61
  • 7
  • If it is simply the text you can use `h1.text` – Thymen Nov 29 '20 at 17:59
  • @Thymen that will include `Details about` in the output, which the OP doesn't want – MendelG Nov 29 '20 at 18:04
  • 1
    Right, which you could of course filter afterwards (python 3.9 `removeprefix`), but the answer from [Mendelg](https://stackoverflow.com/a/65063307/10961342) is then neater. – Thymen Nov 29 '20 at 18:11

2 Answers2

1

Try using the find_next() method with text=True, which will return the first text match, and than use .next to get the next text after that. For example:

from bs4 import BeautifulSoup


html = '''
<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about   </span>2018 Panini Contenders Josh Allen #105 No Feet RC Ticket Auto PSA 10 GEM
'''

soup = BeautifulSoup(html, "html.parser")

print(soup.find(id='itemTitle').find_next(text=True).next)

Output:

2018 Panini Contenders Josh Allen #105 No Feet RC Ticket Auto PSA 10 GEM
MendelG
  • 14,885
  • 4
  • 25
  • 52
0

You could index into a list comprehension based on stripped_strings generator, or itertools.islice into the generator. The latter I found out was possible from @cobbal


from bs4 import BeautifulSoup

html = '''
<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about   </span>2018 Panini Contenders Josh Allen #105 No Feet RC Ticket Auto PSA 10 GEM
'''

soup = BeautifulSoup(html, "html.parser")

print([s for s in soup.select_one('#itemTitle').stripped_strings][1])


from itertools import islice

next(islice(soup.select_one('#itemTitle').stripped_strings, 1, None))
QHarr
  • 83,427
  • 12
  • 54
  • 101