2

I'm trying to rename the columns in a data frame using matching from a lookup table.

oldvars = c("mpg", "cyl" , "disp",  "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb")
newvars = c("Miles Per Gallon", "Cycle", "Displacement", "Horsepower", "Distance Rating", 
"Working Time", "Quick Second", "Versus", "America", "Gears", "Carbohydrates")

lookup = data.frame(oldvars, newvars)
mycars = mtcars

Using the lookup list to match oldvars and change them into newvars, so that names(mycars) would output "Miles Per Gallon", "Cycle", "Displacement", "Horsepower", "Distance Rating", "Working Time", "Quick Second", "Versus", "America", "Gears", "Carbohydrates"

I've tried using colnames to change the names but it isn't reading the variable like I was expecting. The following

for(i in 1:length(newvars)) {
  colnames(mycars)[oldvars[i]] = newvars[i]
} 

just outputs NAs

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
jamzsabb
  • 1,125
  • 2
  • 18
  • 40
  • Subset `names(test)` by "second" and assign the value of `lookup$first`. – A5C1D2H2I1M1N2O1R2T1 Jan 11 '16 at 15:50
  • 1
    Please provide a [reproducible example](http://stackoverflow.com/a/5963610/1412059). – Roland Jan 11 '16 at 15:53
  • 2
    Instead of a new question to get more responses, this should be added to your original question http://stackoverflow.com/questions/34723607/append-data-frame-to-another-data-frame-with-different-column-names – Pierre L Jan 11 '16 at 15:53
  • @Roland I completely re-wrote my question, wrote it after spending too long staring at it this morning. It is now reproducible and shows expected output – jamzsabb Jan 11 '16 at 19:52
  • @PierreLafortune it's related but it's a separate question. I felt it was worth a separate post, I'm not just being a pain and abusing the system. – jamzsabb Jan 11 '16 at 19:57

5 Answers5

10

If you know they're in the same order (as they are in your example) then the answer is easy:

names(mycars) = newvars

However, let's focus on a more general case - not all column need renaming, and there might be some extra names in the name vector, and the order is not necessarily the same:

old_vars = c("cyl" ,"mpg",  "disp",  "foo")
new_vars = c( "Cycle", "Miles Per Gallon", "Displacement", "bar")
lookup = data.frame(old_vars, new_vars)

mycars = head(mtcars, 2) # short data frame to work with

Then match is your friend for determining the correct order:

name_match = match(names(mycars), lookup$old_vars)
name_match
[1]  2  1  3 NA NA NA NA NA NA NA NA

# assign the names
names(mycars)[na.omit(name_match)] = lookup$new_vars[!is.na(name_match)]

data.table has a really nice interface for this, setnames accepts a vector of old and new names

library(data.table)
setDT(mycars) # convert the data frame to data.table
setnames(mycars, old = lookup$old_vars, new = lookup$new_vars) # update names by reference

The old plyr package has a rename function that accepts a named vector, where the vector values are the new names and the vector names are the old names:

plyr::rename(mycars, setNames(lookup$new_vars, lookup$old_vars))
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • I love this solution except that in my case I only want to rename some of the columns, and this turns the ones I am not renaming into NAs. – chrowe May 13 '19 at 20:47
  • @Gregor, is there a dplyr/piping alternative to parallel the `setNames` solution? – Emman Sep 16 '19 at 10:23
  • @chrowe edited to take care of NAs. Emman, not that I know of. – Gregor Thomas Sep 16 '19 at 14:54
  • @GregorThomas I was trying the "more general" base R solution above and it's yielding a lot of `NA`s, seems something is wrong. Also @fantanaman's corrected version of this (below) is not working. My own base R solution (not posted) is pretty verbose, I thought there might be something simpler. – TimTeaFan Jan 15 '23 at 20:41
  • 1
    Hey @TimTeaFan that's interesting to hear. Care to share an example? – Gregor Thomas Jan 17 '23 at 15:23
  • @GregorThomas: Sorry, I didn't mention I was talking about the `mycars` example above. I now figured why the code yielded `NA`s. In your answer you use `lookup$oldvars` in `match()`, and I had the `lookup` df loaded from the OPs post. Maybe you can replace `lookup$oldvars` with just `oldvars` which you define in your answer (unlike `lookup` which comes from the OP). I think this is the reason @fantaman below supplied a corrected version. – TimTeaFan Jan 17 '23 at 16:57
  • 1
    @TimTeaFan I added a `lookup` definition in my answer and switched my column names to include underscores so there shouldn't be any more confusion. – Gregor Thomas Feb 06 '23 at 14:06
1

Adding as answer only because I can't comment yet.

In @Gregor Thomas's correct answer, I suggest to reverse the arguments in the match() call, to account for cases where matching oldvars are not continuous from the first element in names(mycars). Full example below since I have the space.

mycars = head(mtcars, 2)    

oldvars = c("cyl" ,"mpg",  "wt",  "foo") #note change to variable selection in lookup
newvars = c( "Cycle", "Miles Per Gallon", "Weight", "bar")
name_match = match(names(mycars), oldvars)
name_match
[1]  2  1 NA NA NA  3 NA NA NA NA NA

#After omitting the `NA` elements, the match vector no longer properly aligns with
#the names(mycars) vector

names(mycars)[na.omit(name_match)] = newvars[!is.na(name_match)]
names(mycars) 
[1] "Miles Per Gallon" "Cycle"            NA   
[4] "hp"               "drat"             "wt"
[7] "qsec"             "vs"               "am"
[10] "gear"             "carb"

#instead, reverse the arguments in match() to find the data.frame names that appear in the lookup

name_match = match(oldvars, names(mycars))
name_match
[1]  2  1  6 NA
names(mycars)[na.omit(name_match)] = newvars[!is.na(name_match)]
names(mycars)
[1] "Miles Per Gallon" "Cycle"            "disp" 
[4] "hp"               "drat"             "Weight" 
[7] "qsec"             "vs"               "am"    
[10] "gear"             "carb"
fantanaman
  • 35
  • 8
  • It's ok to post another answer fixing something in the existing answer, so it is not needed to explain yourself at the start. But remember each answer should be comprehensive on its own. – Dharman Mar 15 '20 at 15:55
1

Here is one approach using a named vector with dplyr::any_of() inside dplyr::rename(). any_of() is helpful if not all of the columns in the named vector are present in the data.

library(dplyr)

recode_vec <- setNames(lookup$oldvars, lookup$newvars)

mycars %>% 
  rename(any_of(recode_vec)) %>% 
  glimpse() # for better printing

#> Rows: 32
#> Columns: 11
#> $ `Miles Per Gallon` <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.…
#> $ Cycle              <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, …
#> $ Displacement       <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 14…
#> $ Horsepower         <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123,…
#> $ `Distance Rating`  <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.9…
#> $ `Working Time`     <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.…
#> $ `Quick Second`     <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20…
#> $ Versus             <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, …
#> $ America            <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ Gears              <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, …
#> $ Carbohydrates      <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, …

Data from OP

oldvars = c("mpg", "cyl" , "disp",  "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb")
newvars = c("Miles Per Gallon", "Cycle", "Displacement", "Horsepower", "Distance Rating", 
"Working Time", "Quick Second", "Versus", "America", "Gears", "Carbohydrates")

lookup = data.frame(oldvars, newvars)
mycars = mtcars

Created on 2023-01-15 with reprex v2.0.2

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39
0

Solved it with a double for loop

for(i in 1:length(newvars)) {
  for(z in 1:length(newvars)) {
    if(colnames(mycars)[i] == oldvars[z]) {
      colnames(mycars)[i] = newvars[z]
    }
  }
}

Highly inefficient but it gets the job done

jamzsabb
  • 1,125
  • 2
  • 18
  • 40
  • The *double* loop certainly isn't needed. `nn = names(mycars); for(i in 1:length(oldvars)){name(mycars)[nn == oldvars[i]] = newvars[i]}` will do. – Gregor Thomas Sep 16 '19 at 16:28
0

In base R we can do :

old_vars = c("cyl" ,"mpg",  "disp")
new_vars = c( "Cycle", "Miles Per Gallon", "Displacement")
names(mtcars) <- replace(setNames(nm = names(mtcars)), old_vars, new_vars)
head(mtcars,1)
#>           Miles Per Gallon Cycle Displacement  hp drat   wt  qsec vs am gear
#> Mazda RX4               21     6          160 110  3.9 2.62 16.46  0  1    4
#>           carb
#> Mazda RX4    4

Created on 2023-05-28 with reprex v2.0.2

Or with pipes :

mtcars %>%
  setNames(replace(setNames(nm = names(.)), old_vars, new_vars))

mtcars |>
  (\(x) setNames(x, replace(setNames(nm = names(x)), old_vars, new_vars)))()
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167