0

So I am trying to use Xpath to import information from the site into a google spreadsheet and I am struggling a little with <div>.

<div class="value-display__value">
<div title="" data-html-title="">
#295
</div>
</div>

The idea is to import '#295', and here is my code.

IMPORTXML($C2,"//div[@class='value-display__value']//div/text()")

google sheets seems to import empty content instead of 295

Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • @Calculuswhiz yeah sorry its for the easier filling of the table. the URL is https://osu.ppy.sh/users/4504101/osu . I have figured out how to import the username although I struggle with ranking and former username ty – Alex Niremov Sep 07 '20 at 07:10
  • Does this answer your question? [Google Sheets importXML Returns Empty Value](https://stackoverflow.com/questions/61470783/google-sheets-importxml-returns-empty-value) – Rafa Guillermo Sep 07 '20 at 13:32

1 Answers1

1

It looks like the ranking number is populated at runtime, so you can't just use ImportXML() alone. The page also makes no XHR's after loading, which tells me that the data is somewhere on the page already. This gives you two options:

  1. Request an API key and use the get_user API. More info here and on their wiki. They say that their API is going to be going to v2 soon, though, so watch out for that if you want a long-term solution.

  2. Scrape the static HTML file for the right data, since that's where the data resides. I'm assuming you're after the player's rank, so we can do this in two stages:

    1. This scrapes the URL you gave in the comments. I found the ranking data in a script tag with id json-user, so I used.

      =IMPORTXML("https://osu.ppy.sh/users/4504101","//script[@id='json-user']")
      
    2. Then use a REGEXEXTRACT() to find the data you want. The part of the string we're interested in is: "rank":{"global":1,"country":1}}, so I did

      =REGEXEXTRACT(A1,"""rank"":{""global"":(\d+),""country"":(\d+)")
      

      The parens around (\d+) create capture groups for the numbers. This yields two cells: the first is for global and the second is for country rank. If you're just interested in the country rank, you can leave the parens off of the first \d+.

General Grievance
  • 4,555
  • 31
  • 31
  • 45