Your data doesn't reside within an actual html table.
You could use the following css selectors currently - though a lot of the page looks dynamic and I suspect attributes and classes will change in future. I tried to keep a little more generic to compensate but you should definitely seek to make this even more generic if possible.
I use css selectors throughout for the flexibility and specificity gained. The []
denote attribute selectors, the .
denotes class selector, *
is the contains
operator specifiying that the left hand side attribute's value contains the right hand side string e.g. with [class*=screenerBorderGray]
this means the class
attribute contains the stringscreenerBorderGray
.
The " "
,">"
, "+"
between selectors are called combinators and are used to specify relationships between nodes matched by consecutive parts of the selector sequence.
I generate a left column list of nodes and a right column list of nodes (ignoring the chart col in between). I then join these into a final dataframe.
R
library(rvest)
library(magrittr)
pg <- read_html('https://finance.yahoo.com/quote/xlk/holdings?p=xlk&guccounter=1')
lhs <- pg %>%
html_nodes('[id*=Holdings] section > .Fl\\(start\\) [class*=screenerBorderGray] > span:nth-child(1)') %>%
html_text()
rhs <- pg %>%
html_nodes('[id*=Holdings] section > .Fl\\(start\\) [class*=screenerBorderGray] span + span:last-child') %>%
html_text()
df <- data.frame(lhs,rhs) %>% set_names(., c('Title','value'))
df <- df[-c(3),]
rownames(df) <- NULL
print(df)

Py
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs
r = requests.get('https://finance.yahoo.com/quote/xlk/holdings?p=xlk&guccounter=1')
soup = bs(r.content, 'lxml')
lhs = [i.text.strip() for i in soup.select('[id*=Holdings] section > .Fl\(start\) .Bdbc\(\$screenerBorderGray\) > span:nth-child(1)')]
rhs = [i.text.strip() for i in soup.select('[id*=Holdings] section > .Fl\(start\) .Bdbc\(\$screenerBorderGray\) span + span:last-child')]
df = pd.DataFrame(zip(lhs, rhs), columns = ['Title','Value'])
df = df.drop([2]).reset_index(drop = True)
print(df)
References:
- Row re-numbering @thelatemail