0

If I have a website page with multiple tables and I want to retrieve the source code for a specific row from a specific table based on a keyword in beautifulsoup4, how can I go about doing that using the find or find_all methods (or any other methods in that matter) enter image description here

Using the table above, lets say I want to retrieve the row that contains the keyword "ROW 1" (or "A", "B", "C" etc.) and only that row, how can I go about that?

CosmicCat
  • 612
  • 9
  • 19

2 Answers2

0

Grab the entire html with pandas and do the following (this code is untested)

import pandas as pd

html_table = 'From your web scrapping'
df = pd.read_html(io=html_table)
df.loc[1]  # Will give you all the information for the first row

I'd suggest spending 10 minutes to learn pandas it will really help out. https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html

Daniel Butler
  • 3,239
  • 2
  • 24
  • 37
  • Will do! is "html_table = 'From your web scrapping'" equivalent to "source = BeautifulSoup(siteSource, "html.parser")"? – CosmicCat May 13 '19 at 22:44
  • No, you'll want to grab the html table element from bs4. It will look somewhat like this `html_table = soup.find('table')` like in this answer https://stackoverflow.com/questions/2935658/beautifulsoup-get-the-contents-of-a-specific-table – Daniel Butler May 13 '19 at 22:48
0

Contrived example below but with bs4 4.7.1 you can use pseudo-class css selectors of :has and :contains to specify pattern of tr (row) that has td (table cell) which contains 'wanted phrase'. A table identifier is passed as well to target the correct table (id here to make things simple). select will return all qualifying tr elements; use select_one if only the first match is required.

soup.select('#example tr:has(> td:contains("Row 1"))')

py

from bs4 import BeautifulSoup as bs

html = '''
<table id="example">
  <tbody><tr>
    <th>Col1</th>
    <th>Col2</th>
    <th>Col3</th>
  </tr>
  <tr>
    <td>Row 1</td>
    <td>A</td>
    <td>B</td>
  </tr>
  <tr>
    <td>Row 2</td>
    <td>C</td>
    <td>D</td>
  </tr>
</tbody></table>
<table id="example2">
  <tbody><tr>
    <th>Col1</th>
    <th>Col2</th>
    <th>Col3</th>
  </tr>
  <tr>
    <td>Not Row 1</td>
    <td>A</td>
    <td>B</td>
  </tr>
  <tr>
    <td>Not Row 2</td>
    <td>C</td>
    <td>D</td>
  </tr>
</tbody></table>

'''

soup = bs(html, 'lxml') #'html.parser'
soup.select('#example tr:has(> td:contains("Row 1"))')
QHarr
  • 83,427
  • 12
  • 54
  • 101