2

I'm working with an Excel sheet in which some columns contain hyperlinks that are represented as text that is completely different from the actual address the hyperlinks point to. I want to use some R code to modify and subset the Excel sheet but keep the hyperlinks. I think I can do this by extracting those hyperlinks as an indexed character vector then re-introducing them into a new Excel document using the makeHyperlinkString() and writeFormula() functions. But I cannot figure out how to get a vector of the links themselves.

In case it matters, my intention is to do all the modifying and subsetting on a data.frame version of the Excel sheet rather than a workbook object.

Slavatron
  • 2,278
  • 5
  • 29
  • 40
  • With `openxlsx` you should be able to `loadWorkbook` and use `writeData` to write to specific sheets/cells, then `saveWorkbook` without altering any of the other cells/sheets. – IceCreamToucan May 25 '18 at 16:54

1 Answers1

1

Oh, now I think I got your problem. I thought there were only normal hyperlinks not Excel-Hyperlinks.

I think this may help you to get a vector of the hyperlinks, although its a bit messy.

library(openxlsx)
pathtofile =  "path to .xlsx file"

df1 <- read.xlsx(xlsxFile = pathtofile, 
                 sheet = 1, skipEmptyRows = FALSE, 
                 colNames = F, rowNames = F,
                startRow = 1)

## Sheet or Tabelle
Sheet = "Sheet" ## Or "Tabelle"

## Get Names of rows from Hyperlink column
rowIndex <- sub(x = df1[,1], pattern = paste0("(#'",Sheet,"\\d'!)"), replacement = "")

## Get the Sheet, where Hyperlinks are saved
SheetName <- regmatches(df1[,1], regexpr(text = df1[,1], pattern = paste0("(",Sheet,"\\d)")))
## Extract only the Sheet number
SheetIndex <- as.numeric(sub(x = SheetName, pattern = Sheet, replacement = ""))

## Get the row Indexes as numeric
RowIndexNum <- as.numeric(regmatches(rowIndex, regexpr(text = rowIndex, pattern = "\\d")))
## Get the column name as character
RowIndexName <- sub(x = rowIndex, pattern = "\\d", "")
## Create uppercase Letters
myLetters <- toupper(letters[1:26])
## Convert Row Name (character) to numeric (based on alphabetical order)
RowIndexNameNum <- match(RowIndexName, myLetters)

## If Hyperlinks only in 1 Sheet or several sheets
if (length(unique(SheetIndex)) == 1) {
  dfLinks <- read.xlsx(xlsxFile = pathtofile,
                       sheet = unique(SheetIndex), 
                       skipEmptyRows = FALSE, 
                       colNames = F, rowNames = F, 
                       rows = RowIndexNum[1]:tail(RowIndexNum,1),
                       cols = unique(RowIndexNameNum),
                       startRow = 1
                       );
} else {
  dfLinks <- data.frame()
  for (i in unique(SheetIndex)){
    dfTmp <- read.xlsx(xlsxFile = pathtofile,
              sheet = i, 
              skipEmptyRows = FALSE, 
              colNames = F, rowNames = F, 
              rows = RowIndexNum[1]:tail(RowIndexNum,1),
              cols = unique(RowIndexNameNum),
              startRow = 1)
    dfLinks <- rbind(dfLinks, dfTmp)
  }
}

dfLinks

This is how my Excel File looks like:

enter image description here

Community
  • 1
  • 1
SeGa
  • 9,454
  • 3
  • 31
  • 70
  • No, when I do that the data.frame contains the display text rather than the address the links point to. – Slavatron May 25 '18 at 17:48
  • I edited my answer, as i think I misunderstood the question. And maybe i still do.. :) – SeGa May 28 '18 at 15:10