0

From the website http://www.findagrave.com/cgi-bin/fg.cgi?page=gr&GScid=97961&GRid=22682396 I am trying to extract just the text data such as, birth, death, bio, location, and the created by sections into different rows/columns. I want to be able to have a spreadsheet where I can input a FindAGrave URL and have it extract the above data for me. I read here Using =importXML in Google Docs that its possible to do it by descriptions. From there I learned to omit the Xpath tbody. That successfully got my import to work, but without using the descriptions. I'm sure if using descriptions would be more efficient or not. I just want to learn how other people would go about importing data from tables. Thanks

Here is what I got so far. This will extract the Birth information and put in rows. One problem is that it adds an extra cell in between each data.

=IMPORTXML("http://www.findagrave.com/cgi-bin/fg.cgi?page=gr&GScid=97961&GRid=22682396", "//html/body/table/tr/td[3]/table/tr[4]/td[1]/table/tr/td/table/tr/td/table/tr[1]/td[2]")

Result

Dec. 2, 1882        Humphreys County        Tennessee, USA

Update: I think I made some process along in the code. This is what I'm working with now.

=IMPORTXML("http://www.findagrave.com/cgi-bin/fg.cgi?page=gr&GScid=97961&GRid=22682396", "//*[@class='gr'][1]//tr/td/table/tr/td/table/tr[1]/td[1]")
=IMPORTXML("http://www.findagrave.com/cgi-bin/fg.cgi?page=gr&GScid=97961&GRid=22682396", "//*[@class='gr']//tr/td/table/tr/td/table/tr[1]/td[2]/text()[1]")
=IMPORTXML("http://www.findagrave.com/cgi-bin/fg.cgi?page=gr&GScid=97961&GRid=22682396", "//*[@class='gr']//tr/td/table/tr/td/table/tr[1]/td[2]/text()[2]")
=IMPORTXML("http://www.findagrave.com/cgi-bin/fg.cgi?page=gr&GScid=97961&GRid=22682396", "//*[@class='gr']//tr/td/table/tr/td/table/tr[1]/td[2]/text()[3]")
=IMPORTXML("http://www.findagrave.com/cgi-bin/fg.cgi?page=gr&GScid=97961&GRid=22682396", "//*[@class='gr']//tr/td/table/tr/td/table/tr[1]/td[2]/text()[4]")

=IMPORTXML("http://www.findagrave.com/cgi-bin/fg.cgi?page=gr&GScid=97961&GRid=22682396", "//*[@class='gr'][1]//tr/td/table/tr/td/table/tr[2]/td[1]")
=IMPORTXML("http://www.findagrave.com/cgi-bin/fg.cgi?page=gr&GScid=97961&GRid=22682396", "//*[@class='gr']//tr/td/table/tr/td/table/tr[2]/td[2]/text()[1]")
=IMPORTXML("http://www.findagrave.com/cgi-bin/fg.cgi?page=gr&GScid=97961&GRid=22682396", "//*[@class='gr']//tr/td/table/tr/td/table/tr[2]/td[2]/text()[2]")
=IMPORTXML("http://www.findagrave.com/cgi-bin/fg.cgi?page=gr&GScid=97961&GRid=22682396", "//*[@class='gr']//tr/td/table/tr/td/table/tr[2]/td[2]/text()[3]")
=IMPORTXML("http://www.findagrave.com/cgi-bin/fg.cgi?page=gr&GScid=97961&GRid=22682396", "//*[@class='gr']//tr/td/table/tr/td/table/tr[2]/td[2]/text()[4]")

Results:

Birth: 
Nov. 8, 1948
Benton
Saline County
Arkansas, USA

Death: 
Jan. 6, 2006
Tulsa
Tulsa County
Oklahoma, USA

Is there a way to split this data up within the code?

Rubén
  • 34,714
  • 9
  • 70
  • 166
Frank
  • 13
  • 1
  • 7

2 Answers2

3

The following formula

=IMPORTXML(
"http://www.findagrave.com/cgi-bin/fg.cgi?page=gr&GScid=97961&GRid=22682396",
"//html/body/table/tr/td[3]/table/tr[4]/td[1]/table/tr/td/table/tr/td/table/tr[position()<=2]/td/text()"
)

returns

Birth: 
Nov. 8, 1948
Benton
Saline County
Arkansas, USA
Death: 
Jan. 6, 2006
Tulsa
Tulsa County
Oklahoma, USA

A shorter alternative,

=IMPORTXML(
"http://www.findagrave.com/cgi-bin/fg.cgi?page=gr&GScid=97961&GRid=22682396", 
"//tr[4]/td[1]//tr[position()<=2]/td/text()"
)

returns the same result

Rubén
  • 34,714
  • 9
  • 70
  • 166
1

You can get multiple fields by simplifying your xpaths - you can also add multiple into one single function call separating the xpaths with a | :

`=ARRAYFORMULA(TRIM(TRANSPOSE(IMPORTXML($A3,"//td[@align='left']/text()|//tr[6]/td/a|//tr[3]/td/text()[1]"))))`

enter image description here

The three xpaths used are:

//td[@align='left']/text()

//tr[6]/td/a

//tr[3]/td/text()[1]

Aurielle Perlmann
  • 5,323
  • 1
  • 15
  • 26