1

Background

I'm writing a .R script on my computer (Windows), which (1) must maintain a Log.xlsx file recording data from its runs; and (2) will be run by a colleague on a separate VM (also Windows).

Requirements

I'm looking for a package in R that will (1) let me append R data (from a data.frame) immediately after the last nonempty row, in an existing sheet within that existing Log.xlsx workbook; while (2) remaining easily portable, such that any necessary packages can be automatically installed by the script itself (via pacman::p_load()*) if the VM does not already have them installed.

* I ensure the presence of pacman using base functionality:

if(!"pacman" %in% installed.packages()) {
    install.packages("pacman")
}

Preferences

Since this log will grow quickly — the script runs daily and will log many dozens of records per run — I would prefer to avoid the approach proposed here, which simply loads all the existing worksheet data into R, where it combines the datasets and then (over)writes the result as a new .xlsx (or .csv) file. I would likewise prefer to avoid the (better?) approach proposed here, which similarly loads and combines everything in R, but then overwrites the sheet within the workbook, rather than overwriting the workbook itself.

In short, I strongly prefer "granular appendability", to simply add new data to the end of an existing sheet, without having to load, combine, and (over)write the entire sheet's (let alone workbook's) worth of data in R.

Roadblock

I have investigated the packages xlsx, openxlsx (which seems to have some stability issues), and XLConnect. While XLConnect seems particularly relevant (see XLConnect::appendWorksheet()) and portable even across operating systems, there remains the problem of Java. At the moment, pacman::p_load() will successfully install XLConnect and xlsx (among others), yet any attempt to load these two packages (via library()) results predictably in

Error: package or namespace load failed for ‘xlsx’:
 .onLoad failed in loadNamespace() for 'rJava', details:
  call: fun(libname, pkgname)
  error: JAVA_HOME cannot be determined from the Registry

I could obviously correct this issue on my computer, by following the instructions here to install the version of Java (32- or 64-bit) compatible with my version of R. However, I would like to keep the script self-contained, such that it handles its own dependencies and runs painlessly on the VM, regardless of which "bit-versions" of R and Java are installed on the VM. In particular, I don't want to force my colleague to manually install the appropriate version of Java.

In Conclusion

While the second solution (which apparently avoids the Java roadblock) will do in a true pinch, any help to achieve my listed Preferences would be greatly appreciated!

Greg
  • 3,054
  • 6
  • 27
  • Does the log need to be kept in `xlsx` file? You could use `fwrite` in `data.table` which can append to the existing file but it saves as a text file – MKa Jun 04 '21 at 06:45
  • Thanks, @MKa. The reason I incline toward `.xlsx` (or a spreadsheet in general) is that I have tabular data to store, possibly of various entities. Excel "tabs" (sheets within workbooks) allow tables of multiple entities to be flexibly stored in a single, central Log. Furthermore, in Excel there is no reliance on character delimiters, and I want to completely avoid any potential clashes between my text data and the delimiters in formats like `.csv`. – Greg Jun 04 '21 at 13:06
  • I understand - I am not sure about the `.xlsx` option but writing them into a database e.g. `SQLite` and put each tab as a separate table could be an option too. – MKa Jun 07 '21 at 01:30
  • Hi @MKa! I originally considered a DB, but I realized it would paint me into a corner. If the DB is hosted locally, by the same machine that runs the script, then portability becomes a even greater hassle: instead of just manually ensuring that compatible "bit-versions" of R and Java are installed on the VM, I must now replicate a custom DB on that VM...all on behalf of my colleague. If I instead create this database on the company servers, I must introduce another DB connection as a potential source of error, where failure to write one log entry could ruin consistency in my deliverables. – Greg Jun 07 '21 at 15:02
  • Hmmm...I just realized that, if I *can* manage to work around character delimiters, I could create a `/Log` subdirectory (rather than a `Log.xlsx` file) right next to the script, and within this subdirectory I could have one text file (like `.csv`) for each sheet that would have been in the `.xlsx`. "Records" (lines of text) could then be iteratively appended to each "sheet" (text file) in the "workbook" (subdirectory), without having to load an entire Excel workbook (or any other file) into R and then rewriting it to include the new rows. This would maintain tabular logs locally and portably – Greg Jun 07 '21 at 15:08

0 Answers0