Extract all
with the class="myTag" but without

Question

I am struggling with the ImportXML function from Google Spreadsheet. A specific webpage has this content:

<li class="myTag">
    <a href="/xyz/abc/">
        Jouets
        <small class="hidden">
            <span>235&nbsp;6453&nbsp;sous-section</span>
            <span>1231&nbsp;765&nbsp;produits</span>
            <span>12155&nbsp;654&nbsp;images</span>
        </small>
    </a>
</li>
<li class="myTag">
    <a href="/xyz/def/">
        Livres
        <small class="hidden">
            <span>235&nbsp;6453&nbsp;sous-section</span>
            <span>1231&nbsp;765&nbsp;produits</span>
            <span>12155&nbsp;654&nbsp;images</span>
        </small>
    </a>
</li>

I want to extract all li tags with the class="myTag" but without the <small> section, in order to have this list:

Jouets
Livres

I have written this XPath expression:

=IMPORTXML(B1; '//li[@class='myTag'].ReplaceAll("<[^>]*>", "", "")')

But I have the #N/A error.

How can I extract just the first elements (Jouets, Livres) without the <small> section?

Originality of this question

Scraping data to Google Sheets from a website that uses JavaScript is about dynamic web pages which is not my case. My page is already generated in HTML content - I can already extract the full content within the <li> tag with the parameter '//li[@class='myTag']'.

My point is to exclude the <small> content which is glued.
I am looking to get this list:

Jouets
Livres

and not this one:

Jouets235 6453 sous-section1231 765 produits12155 654 images
Livres235 6453 sous-section1231 765 produits12155 654 images

Thank you for replying and providing the URL. When I saw the HTMl of your URL, I cannot find the values of `Jouets` and `Livres`. And also, I cannot find the "li" tag with the class of `myTag`. I apologize for this. — Tanaike, Feb 06 '23 at 12:13
Sorry, I anonymised the content. Here is the full importxml function: =IMPORTXML(https://www.universalis.fr/classification/; '//li[@class='with-counter']') — macaaron, Feb 06 '23 at 12:17
Thank you for replying. About `I anonymised the content. `, I couldn't notice it. I apologize for this. — Tanaike, Feb 06 '23 at 12:17
Did you succeed with the test? `=IMPORTXML('https://www.universalis.fr/classification/'; '//li[@class='with-counter']')` — macaaron, Feb 06 '23 at 17:20
Thank you for replying. Unfortunately, from your reply, I couldn't understand your expected values. But I would like to try to understand it. When I could correctly understand it, I would like to think of a solution. When you use your provided formula of `=IMPORTXML('https://www.universalis.fr/classification/'; '//li[@class='with-counter']')`, what did you obtain the values and what values do you want to retrieve? So, can I ask you about the detail of your expected value? — Tanaike, Feb 07 '23 at 06:31
Thank you. Currently, with this formula I have this result (just pasting the first lines): - `Arts1 069 sous-thèmes15 778 articles11 949 médias` ; - `Astronomie et astrophysique107 sous-thèmes570 articles696 médias` ; - `Chimie147 sous-thèmes1 042 articles1 087 médias` But I am attempting to get only: - `Arts` ; - `Astronomie et astrophysique` ; - `Chimie` — macaaron, Feb 07 '23 at 09:27
Thank you for replying. From your reply, I proposed an answer. Please confirm it. If I misunderstood your expected values, I apologize. — Tanaike, Feb 07 '23 at 11:25

score 0 · Accepted Answer · answered Feb 07 '23 at 11:25

From your following reply,

Currently, with this formula I have this result (just pasting the first lines): - Arts1 069 sous-thèmes15 778 articles11 949 médias ; - Astronomie et astrophysique107 sous-thèmes570 articles696 médias ; - Chimie147 sous-thèmes1 042 articles1 087 médias But I am attempting to get only: - Arts ; - Astronomie et astrophysique ; - Chimie

How about the following XPath?

Modified XPath:

//li[@class='with-counter']/a/text()

Modified formula:

=IMPORTXML("https://www.universalis.fr/classification/"; "//li[@class='with-counter']/a/text()")

Testing:

@macaaron Thank you for replying and testing it. I could correctly understand your question with your cooperation. Thank you, too. — Tanaike, Feb 07 '23 at 12:08

Extract all with the class="myTag" but without

1 Answers1

Modified XPath:

Modified formula:

Testing:

Extract all
with the class="myTag" but without