1

I have an input xlsx workbook with many (thousands) internal hyperlinks to many other sheets (in the same workbook). A small, representative example at xlsx image: "A2" links to "linksTab!B3". xlsx file itself here on dropbox.

which I can access (Based on this similar question) using

library(XML)

# rename file to .zip
my.zip.file <- sub("xlsx", "zip", my.excel.file)
file.copy(from = my.excel.file, to = my.zip.file)

# unzip the file
unzip(my.zip.file)

# unzipping produces a bunch of files which we can read using the XML package
# sheet1 has our data
xmlfile <- xmlParse("xl/worksheets/sheet1.xml")

However, the final xml extraction in referenced question output yields an empty output in this case. Digging into the XML sheet1, I found that the links are represented in the xml file as

< hyperlink ref="A2" location="linksTab!B3" tooltip="goto link" display="link" xr:uid="{94...'

How do I extract them?

Robbes
  • 135
  • 8

1 Answers1

1

Found my own answer: Once the xmlfile is loaded as per above, get the links via

# grab the hyperlinks, and dump the XML components by convert them to character vector
 linkReferences <- tibble(
     RefCell  = as.character( xpathApply(xmlfile, "//x:hyperlink/@ref",      namespaces="x")),
     LinkCell = as.character( xpathApply(xmlfile, "//x:hyperlink/@location", namespaces="x"))
 )

 linkReferences  
 # A tibble: 5 x 2
   RefCell    LinkCell
     <chr>       <chr>
 1      A2 linksTab!B3
 2   A3:A5 linksTab!B3
 3      A3 linksTab!D3
 4      A4 linksTab!F3
 5      A5 linksTab!H3
Robbes
  • 135
  • 8