0

I have an Excel (.xlsx) workbook that has a column in a worksheet that has text that is hyperlinked. Say I have in cell A1 the text "click me!", but is hyperlinked to "https://poi.apache.org/". I want to extract the url in addition to the cell text.

See: https://ibb.co/DzLSPM9 . You can see at this link that there is text there, it is hyperlinked, but what I want to extract is the hyperlink itself, not the text.

I wanted to submit here, but that question is closed for answering. Extract hyperlink from Excel file in R

mpettis
  • 3,222
  • 4
  • 28
  • 35
  • It would make it easier for other people to also answer if you made the example data in your question rather than your answer – camille Nov 18 '21 at 01:09
  • @camille edited. Not sure how to embed a picture in there, not sure I have permissions. – mpettis Nov 18 '21 at 01:44
  • I was thinking more of the first dozen or so lines of the code in your answer—that's what creates the example data, so you might as well put it in the question to make the question reproducible – camille Nov 18 '21 at 05:45

1 Answers1

0

Here is my solution. It took some digging, and I think perhaps I should make a pull request to the project to get this as a feature.

# Reprex to read xlsx hyperlinks
# See: https://stackoverflow.com/questions/24149821/extract-hyperlink-from-excel-file-in-r
library(xlsx)

# First, make a toy spreadsheet with hyperlinks
# https://cran.r-project.org/web/packages/xlsx/xlsx.pdf

wb <- createWorkbook()
sheet1 <- createSheet(wb, "Sheet1")
rows <- createRow(sheet1, 1)
cells <- createCell(rows, colIndex=1)

## Add hyperlinks to a cell
cell <- cells[[1,1]]
address <- "https://poi.apache.org/"
setCellValue(cell, "click me!")
addHyperlink(cell, address)

saveWorkbook(wb, "sample-hyperlink.xlsx")

#;; Clear the objects
rm(list=ls())

# Read in the workbook, extract the link
# Refs:
#     https://rforge.net/rJava/

# Load the workbook, get the components
wb <-
    loadWorkbook("sample-hyperlink.xlsx")
sheet <-
    getSheets(wb)$Sheet1
rows <-
    getRows(sheet)
# Get first column (that is the second argument)
cells <-
    getCells(rows, 1)

# Note that getCellValue won't get the link
getCellValue(cells[[1]])
#> [1] "click me!"

# This is how you have to extract the hyperlink
library(rJava)

# What java class is this?
str(cells[[1]])
#> Formal class 'jobjRef' [package "rJava"] with 2 slots
#>   ..@ jobj  :<externalptr> 
#>   ..@ jclass: chr "org/apache/poi/ss/usermodel/Cell"

# Look up the class: https://poi.apache.org/apidocs/dev/org/apache/poi/ss/usermodel/Cell.html

# methods:
.jmethods(cells[[1]])

#> ...
#> [12] "public org.apache.poi.ss.usermodel.Hyperlink org.apache.poi.xssf.usermodel.XSSFCell.getHyperlink()"                     
#> [13] "public org.apache.poi.xssf.usermodel.XSSFHyperlink org.apache.poi.xssf.usermodel.XSSFCell.getHyperlink()"               
#> ...

# This is the method we want
str(J(cells[[1]], "getHyperlink"))
#> Formal class 'jobjRef' [package "rJava"] with 2 slots
#>   ..@ jobj  :<externalptr> 
#>   ..@ jclass: chr "org/apache/poi/xssf/usermodel/XSSFHyperlink"

# Get methods on the objects returned:
.jmethods(J(cells[[1]], "getHyperlink"))
#>  [1] "public java.lang.String org.apache.poi.xssf.usermodel.XSSFHyperlink.getAddress()"                                                   
#>  ...

# This gets the link
J(J(cells[[1]], "getHyperlink"), "getAddress")
#> [1] "https://poi.apache.org/"

Created on 2021-11-17 by the reprex package (v2.0.1)

mpettis
  • 3,222
  • 4
  • 28
  • 35