1

Hey everyone I'm trying to scrape linkedin info. I've got this source code. My problem is I know how to get the info with the section ID, however this ID changes on every page refresh

<section id="ember443" class="artdeco-card ember-view relative break-words pb3 mt2 " tabindex="-1"><!---->

    <div id="experience" class="pv-profile-card-anchor"></div>
            <!---->

            <div class="pvs-list__outer-container">
<!---->    <ul class="pvs-list
        ph5 display-flex flex-row flex-wrap
        ">
        <li class="artdeco-list__item pvs-list__item--line-separated pvs-list__item--one-column">
                <!----><div class="pvs-entity
    pvs-entity--padded pvs-list__item--no-padding-when-nested
    
    ">
  <div>
        <a data-field="experience_company_logo" class="optional-action-target-wrapper 
        display-flex" target="_self" href="https://www.linkedin.com/company/22316561/">
        <div class="ivm-image-view-model  pvs-entity__image ">
    <div class="ivm-view-attr__img-wrapper ivm-view-attr__img-wrapper--use-img-tag display-flex
    
    ">

</div>
  </div>
    </a>

  </div>

  <div class="display-flex flex-column full-width align-self-center">
    <div class="display-flex flex-row justify-space-between">
        <div class="
          display-flex flex-column full-width">
    
        <div class="display-flex align-items-center">
            <span class="mr1 t-bold">
              <span aria-hidden="true"><!---->CEO &amp; Founder<!----></span><span class="visually-hidden"><!---->CEO &amp; Founder<!----></span>
            </span>
<!----><!----><!---->        </div>
          <span class="t-14 t-normal">
            <span aria-hidden="true"><!---->Runa<!----></span><span class="visually-hidden"><!---->Runa<!----></span>
          </span>
          <span class="t-14 t-normal t-black--light">
            <span aria-hidden="true"><!---->Jan 2018 - Present · 4 yrs 10 mos<!----></span><span class="visually-hidden"><!---->Jan 2018 - Present · 4 yrs 10 mos<!----></span>
          </span>
          <span class="t-14 t-normal t-black--light">
            <span aria-hidden="true"><!---->Mexico City Area, Mexico<!----></span><span class="visually-hidden"><!---->Mexico City Area, Mexico<!----></span>
          </span>
      
  

I've achieved to get all the sections with this class with:

experiences = soup.find_all("section", {"class": "artdeco-card ember-view relative break-words pb3 mt2"})

However I need the text within the div id "experience" section. I've tried with:

div = soup.find_all(id="experience")

But it only gets me that tag and nothing else. Any ideas on how could I get the jobs info within the specific "experience" section? Thank you so much in advance

  • Perhaps this will help: [beautifulsoup-innerhtml](https://stackoverflow.com/questions/8112922/beautifulsoup-innerhtml) – Ryan Wilson Oct 04 '22 at 19:36

1 Answers1

0

Well, there isn't any test inside the div with id="experience" - the data you want is after that. So maybe try something like

expAnchor = soup.find(id="experience")
if expAnchor: #to avoid error, in case expAnchor = None
    expContainer = expAnchor.find_next('div', {"class": "pvs-list__outer-container"}) 

Or, you could use css selectors and get it in one call like:

expContainer = soup.select_one('#experience ~ div.pvs-list__outer-container')
Driftr95
  • 4,572
  • 2
  • 9
  • 21