Function to rename columns in R dataframe with an evolving list

Question

I often prepare summary tables of statistics that I share at work. The tables often contain the same type of data and column headers (e.g. number of bylaw violations, number of units, etc.). I often work with shorthand column names in R data frames ("nbbldg", "nbunits", "nbvl") or other column names inherited from imported tables. Here's an example:

df <-
  data.frame(
    DESCRIPTION_TXT_BLW = c(
      "Missing plumbing fixture",
      "Improperly installed heating unit",
      "Loose or damaged siding",
      "Peeling paint"
    ),
    DESCR_UNIT = c("Apartment", "Apartment", "Common area", "Common area"),
    nbvl = as.integer(c(12, 4, 76, 4))
  )

I then translate the column names into their "readable" counterparts before exporting to csv through the following function (example list provided) :

changecolnames<-function (df, codetotext) 
{
  lapply(names(df), function(x) {
    if (x %in% names(codetotext)) {
      codetotext[[x]]
    }
    else {
      x
    }
  })
}

readablecolnames <-
      list(
        "DESCR_UNIT" = "Description of unit",
        "DESCRIPTION_TXT_BLW" = "Description of bylaw violation",
        "nbvl" = "Number of bylaw violations"
      )

names(df)<-changecolnames(df, readablecolnames)

So far, I have project specific lists which allow to me convert the columns names. I would like to aggregate the disparate lists into a global one accessible from any R project (in RStudio) and keep adding to it. My objective is to avoid creating a list in each project, and instead refer to a sort of easy-to-update master "library". What is the best way of achieving this?

You could have a central R file that contains your list of names and `source` it in each project. — divibisan, Aug 07 '18 at 16:49
To make the shorthand/readable name pairs easier to enter as I grow my "library", could the file I source be a csv that then gets translated to a named list ? — Dealec, Aug 08 '18 at 15:33

score 0 · Accepted Answer · answered Aug 08 '18 at 15:58

What I'd do is have a central R file that contains this list of names and then source it to load it into each project.

If you'd rather keep the name pairs in a .csv file, this R file could instead generate the name list from a single file instead of holding it itself:

name_pairs.csv:

short_name,full_name
DESCR_UNIT,Description of unit
DESCRIPTION_TXT_BLW,Description of bylaw violation
nbvl,Number of bylaw violations

load_name_pairs.R:

name_pairs <- read.table('~/Desktop/test/name_pairs.csv', sep = ',',
                         header = TRUE, stringsAsFactors = FALSE)

readablecolnames <- name_pairs$full_name
names(readablecolnames) <- name_pairs$short_name
rm(name_pairs)

At the start of your R project:

source('~/Desktop/test/load_name_pairs.r')
readablecolnames


           DESCR_UNIT              DESCRIPTION_TXT_BLW                             nbvl 
"Description of unit" "Description of bylaw violation"     "Number of bylaw violations"

As you can see, by using source on load_name_pairs.r, all the code in the sourced file is run and objects are i,ported into the sourcing environment. So with just one line in your project file, you can load and parse a central .csv file and access the results in your project.

Thanks, works perfectly. For added convenience, could I insert the "source" code within the changecolnames (in my question) function ? — Dealec, Aug 08 '18 at 18:25
Yeah, that should work. If `source` is run with the default `local=FALSE` argument, sourced objects will be loaded into the global environment, so it doesn't matter if you call it from a function. — divibisan, Aug 08 '18 at 18:30
I added the code `source('~/Desktop/test/load_name_pairs.r')` to my .Rprofile file ([how to edit Rprofile](https://stackoverflow.com/a/46819910/10047977)), to load the name pair table automatically when I launch RStudio. — Dealec, Aug 10 '18 at 13:49

Function to rename columns in R dataframe with an evolving list

1 Answers1