2

Usuaully I use Tidyverse to read in excel files with the read_excel command, however I encountered the dreaded "Unknown or uninitialised column" bug that refers to a non existent column and then warns about said not existent column from then on through the workflow.

So I decided to use openxlsx instead to read in the excel files. All was going well until I realised that openxlsx sees column names with white space as not syntactically correct and it adds a . to replace the whitespace. So 'Customer Name' becomes 'Customer.Name'.

I tried using the check.names=FALSE command to leave the headers in tact, but the package seems to ignore this command.

Many of the headers might have more than a single space between the words and the format has to stay the same. I cannot use an excel package that relies on Java as our company has blocked it.

How can I force openxlsx to leave the header alone?

Example of the code I am using is here: IMACS <- read.xlsx("//zfsstdscun001a.rz.ch.com/UKGI_Pricing/Bus_Insights/R_Scripts/IMACS.xlsx",check.names=FALSE, sheet = "IMACS")

Saarek
  • 119
  • 2
  • 11
  • 4
    try `sep.names = " "`, as the default is `"."` – Matt Apr 01 '20 at 10:08
  • Hi Matt, thanks for the tip. It does indeed remove the . and all headers that just use a single space between words are now correct. But those that have more than one space format as just a single space now. It's a terrible naming convention and I have fought against maintaining it, but they have other processes that are setup for the exact structure of the old files, so I have to maintain it. – Saarek Apr 01 '20 at 10:13
  • 1
    Why not use `readxl::read_xlsx`? – Matt Apr 01 '20 at 10:41
  • 1
    Now that seems to work a charm, but I'm not sure how it works. It's reading it in like the tidyverse read_xlsx command, so I don't have to worry about the sep.names anymore. The one file that was giving the issues with the non existent column in dplyr is now also working correctly. Out of intererest, what exactly is the command doing there? ** Found the answer here: https://stackoverflow.com/questions/35240971/what-are-the-double-colons-in-r ** Thanks for the help Matt! – Saarek Apr 01 '20 at 10:45
  • @Matt Could you post your answer as an answer so that I can mark it as correct? – Saarek Apr 01 '20 at 10:56
  • @Saarek You can post Matt's comment as the answer to your question and accept it. – UseR10085 Jul 10 '23 at 11:22

1 Answers1

2

All credit to @Matt on this.

Using readxl and read_excel together worked a treat.

IMACS <- readxl::read_excel("//zfsstdscun001a.rz.com/UKGI_Pricing/Bus_Insights/R_Scripts/CAT Risks/IMACSV2.xlsx",
                            sheet = "IMACS")

With openxlsx package, the following can be used:

IMACS <- read.xlsx("//zfsstdscun001a.rz.ch.com/UKGI_Pricing/Bus_Insights/R_Scripts/IMACS.xlsx", 
                   sep.names = " ", sheet = "IMACS")
UseR10085
  • 7,120
  • 3
  • 24
  • 54
Saarek
  • 119
  • 2
  • 11