how scrape - span aria-hidden="true" - text

Question

I am trying to web scrape using selenium and beautiful soupe but i cannot get selenium to find the element I need and return the text.

here is the html:

<span class="t-14 t-normal">
            <span aria-hidden="true"><!---->Crédit Agricole CIB · Full-time<!----></span><span class="visually-hidden"><!---->Crédit Agricole CIB · Full-time<!----></span>
          </span>

Do you know how to get the text 'Crédit Agricole CIB Full-time' from this html?

I am trying to do something like this:

src = driver.page_source
soup = BeautifulSoup(src, 'lxml')                                    # Now using beautiful soup
intro = soup.find('div', {'class': 'pv-text-details__left-panel'})

text_loc = intro.find( ???? )                                        # Extracting the text
text = text_loc.get_text().strip()                                   # Removing extra blank space

I do not know what to put in the ????

Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Oct 25 '22 at 18:25

Driftr95 · Accepted Answer · 2022-10-26T18:38:01.297

0

I can't confirm without knowing exactly what the full HTML looks like - there might be other very similarly nested elements before the snippet shared in the question, but if there aren't then you can use soup.select_one with the css selectors used below:

spanTxt1 = soup.select_one('span.t-14.t-normal span[aria-hidden="true"]')
if spanTxt1 is not None: spanTxt1 = spanTxt1.get_text(strip=True)

spanTxt2 = soup.select_one('span.t-14.t-normal span.visually-hidden')
if spanTxt2 is not None: spanTxt2 = spanTxt2.get_text(strip=True)

print(f' Text1: "{spanTxt1}" \n Text2: "{spanTxt2}" ')

should give the output

 Text1: "Crédit Agricole CIB · Full-time" 
 Text2: "Crédit Agricole CIB · Full-time"

EDIT:

I think the ember.. section ids are dynamically generated and might be different every time. A more reliable selector for the jobs listed in the experience section might be

expSel = 'div#experience ~ div.pvs-list__outer-container ul.pvs-list li'

(It's going for the list next to the [empty] div id="experience" anchor)

You can even choose a specific experience from the list by changing the end to li:nth-child(2) for the second experience, li:last-child for the last experience, li:nth-last-child(2) for the second-to-last experience, etc...

You could directly add on to the selector to get the first company:

c1span =  soup.select_one(expSel+' span.t-14.t-normal span')
if c1span is not None:
    print(c1span.get_text(strip=True))

and that should print Crédit Agricole CIB · Full-time

You could also use expSel to get all the listed experience:

expSelRef = {
    'Position': 'span.mr1.t-bold',  
    'Company+Type': 'span.t-14.t-normal',
    'Dates': 'span.t-14.t-normal.t-black--light', 
    'Location': 'span.t-14.t-normal.t-black--light + span'
}
for e in soup.select(expSel):
    for r in expSelRef:
        eDet = e.select_one(expSelRef[r]+' span[aria-hidden="true"]')
        if eDet is not None: 
            print(f' [ {r}: "{eDet.get_text(strip=True)}" ] ', end='')
    print()

output:

 [ Position: "Structured Products & Equity Derivatives Sales" ]  [ Company+Type: "Crédit Agricole CIB · Full-time" ]  [ Dates: "Jan 2020 - Present · 2 yrs 10 mos" ]  [ Location: "Paris, Île-de-France, France" ] 
 [ Position: "Equity Sales Trader Assistant" ]  [ Company+Type: "ODDO BHF · Internship" ]  [ Dates: "Jun 2019 - Jan 2020 · 8 mos" ]  [ Location: "Paris, Île-de-France, France" ] 
 [ Position: "Wealth Management Analyst" ]  [ Company+Type: "HSBC · Internship" ]  [ Dates: "Mar 2018 - Sep 2018 · 7 mos" ]  [ Location: "Paris, Île-de-France, France" ] 
 [ Position: "Business Developper" ]  [ Company+Type: "Capgemini · Internship" ]  [ Dates: "Jan 2017 - Aug 2017 · 8 mos" ]

edited Oct 26 '22 at 18:38

answered Oct 25 '22 at 17:27

Driftr95

4,572
2
9
21

Hi, it works but it returns another 'span text' above, as you expected it to. Do you know how to select the second one? The text I want is exactly the same but the second one. – LaC Oct 26 '22 at 06:21
The span class above appears to be different from the first one: Structured Products & Equity Derivatives Sales – LaC Oct 26 '22 at 06:24
@HugoChikli To print "Structured Products & Equity Derivatives Sales" from the second span, use `print(soup.select_one('.mr1.t-bold').get_text(strip=True))` – Driftr95 Oct 26 '22 at 07:03
@HugoChikli or did you mean that you're getting "Structured Products & Equity Derivatives Sales" but you don't want to? Because then you might try my answer with `spanTxt2 = soup.select_one('span.t-14.t-normal > span.visually-hidden')` [the added `>` specifies direct descendants (children) only] - can't really tell for sure without seeing full html though – Driftr95 Oct 26 '22 at 07:04
When I use your code, I get a text from a section a little before on the website that has precisely the same division HTML. The website is a LinkedIn profile page where I wanted to get the information about the experience. – LaC Oct 26 '22 at 07:23
I want the text from this section
and not ember940
– LaC Oct 26 '22 at 07:42
@HugoChikli try `spanTxt2 = soup.select_one('section#ember941 span.t-14.t-normal > span.visually-hidden')` ? – Driftr95 Oct 26 '22 at 07:49
I am getting a 'None' – LaC Oct 26 '22 at 08:25
spanTxt1 = soup.select_one('span.t-14.t-normal span[aria-hidden="true"]') if spanTxt1 is not None: spanTxt1 = spanTxt1.get_text(strip=True) spanTxt2 = soup.select_one('section#ember941 span.t-14.t-normal span.visually-hidden') if spanTxt2 is not None: spanTxt2 = spanTxt2.get_text(strip=True) print(f' Text1: "{spanTxt1}" \n Text2: "{spanTxt2}" ') – LaC Oct 26 '22 at 08:25
Text1: "Thibault’s recent posts and comments will be displayed here." Text2: "None" – LaC Oct 26 '22 at 08:25
@HugoChikli I can't debug without the actual link/ full html – Driftr95 Oct 26 '22 at 13:58
here is the link: https://www.linkedin.com/in/thibault-arrighi-a43296130/ – LaC Oct 26 '22 at 14:09
So I want to get the first experience and after ideally the company and start date ! Let me know if you can help – LaC Oct 26 '22 at 14:10
@HugoChikli please see my edits and let me know if it works. I addeda few more details than asked for; feel free to ignore the second part of my edit – Driftr95 Oct 26 '22 at 18:27
@HugoChikli ....I do wish you had shared the link earlier - I actually once posted [another answer](https://stackoverflow.com/a/73957068/12652373) about the experience section of LinkedIn which might have helped you – Driftr95 Oct 26 '22 at 18:27
Yes, it is working now thank you!! I am just not sure what refers to what? So if I want to export this in pandas data frame should I just use expSelRef? – LaC Oct 26 '22 at 21:35
Hi, don't bother i figured it out !! Big thanks @Driftr95 !! This truly helps – LaC Oct 26 '22 at 22:06

how scrape - span aria-hidden="true" - text

1 Answers1