Scraping react-id

Question

I'm trying to use scrapy in this page to extract the phone number in the element:

sel = selector(response)
sel.xpath('.//*[@class="ProfileSimpleContact-item"]/span/span/text()').extract()

but this returns:

['(11) 98528-27...']

I want to get the full number (without "..."), which only appears with dynamic clicking a react id. How can I get it?

I see it is only generated when you click it. https://stackoverflow.com/questions/6682503/click-a-button-in-scrapy — Sailesh Kotha, Aug 26 '18 at 00:43

Joaquin · Accepted Answer · 2018-08-26T00:49:04.577

You can use splash as last option, it will cause that your spider be more expensive and complex.

Luckily, in your case you can use one of the <script> tags to get the required data.

First you need to get the correct <script> tag:

ans = response.xpath("//script[contains(text(),'telephone')]/text()").extract_first()

It gives you a json like this:

{
    "@context": "http://schema.org",
    "@type": "Person",
    "name": "Cynthia Hóss Rocha",
    "description": "advogada há 15 anos.",
    "telephone": "(11) 985282712",
    "image": "imgs.jusbr.com/profiles/5368773/images/1419878998_standard.jpg",
    "jobTitle": "Advogado",
    "url": "https://cynthiahossrocha.jusbrasil.com.br",
    "address": {
        "@type": "PostalAddress",
        "addressLocality": "São Paulo (SP)",
        "streetAddress": "Rua Marconi, 131",
        "postalCode": "01047-000"
    }
}

To convert it into an object you need to import json and use json.loads:

json_ans = json.loads(ans)

Finally you only need to extract the required value:

phone = json_ans["telephone"]

Scraping react-id

1 Answers1