-1

I want to use google sheet's IMPORTXML to extract the gene name (SLC3A1) and ensembl ID (ENSG00000138079) from this URL: http://asia.ensembl.org/Multi/Search/Results?q=SLC3A1;site=ensembl

I tried copying xpath from Chrome and also tried deriving it on my own step by step, but I am only getting a #NA.

My xpath: /html/body/div[1]/div/div[2]/div[1]/div[1]/div[2]/div[2]/div[4]/div/div/div[2]/div/div/div[3]/div[2]/div[1]/div[2]/div/div/div/div[1]/div/div[2]/span

From Chrome: //*[@id="solr_content"]/div/div/div[2]/div/div/div[3]/div[2]/div[1]/div[2]/div/div/div/div[1]/div/a

The idea is to extract gene name and ID to google sheets for any gene name I supply.

Rubén
  • 34,714
  • 9
  • 70
  • 166
ANIPON
  • 1
  • 3

2 Answers2

0

This XPath worked on Firefox and should work on Chrome too since it's standard XPath.

$x('//a[ancestor::div[@id="solr_content"] and @class="table_toplink" and .="SLC3A1 (Human Gene)"]/following-sibling::div/span[@class="id"]/text()')

Result

Array [ #text
 ]
​
0: #text "ENSG00000138079"
LMC
  • 10,453
  • 2
  • 27
  • 52
0

you are getting #N/A error due to importxml (or any other import) formula does not support the scrapping of JavaScript elements. you can test this always by disabling JS for a given site and what's left can be usually imported into google sheets

enter image description here

player0
  • 124,011
  • 12
  • 67
  • 124