0

Let's say I have a data frame with around 150 variables that I need to change their names, based on a separate CSV file (an "index"). The order of the vars is different between the data frame and the index, and to make things worse, their are some vars that maybe missing in the index and vice versa.

Is there an elegant way or a package that can help me do so?

Here is an example of the data I'm dealing with.

library(dplyr)
library(tibble)

##My original data frame

orig <- tribble(
~item,     ~protein,    ~carbohydrates,    ~total_fat,       ~energy,    ~zinc,
1,        5.4,        10.6,             7.3,             90,        3.4,  
2,        10.3,       11.6,             3.3,             10,        2.1, 
3,        8.4,        10.6,             2.3,             52,        0.2,
4,        2.7,        8.6,              20.3,            356,       1.3)

##New names index

csv_nm <- tribble(
  ~new_name,   ~old_name,
  "nut203",    "protein",
  "nut204",    "total_fat",
  "nut205",    "carbohydrates",
  "nut208",    "energy",
  "nut303",    "iron") 

I tried using vectors as kindly suggested by Peter here:

## create a named vector to use with dplyr::rename

nm_vec <- csv_nm$old_name
names(nm_vec) <- csv_nm$new_name

## rename, subsetting the named vector to exclude names which are not present in the dataframe 

tib_new_names <- 
  orig %>% 
  rename(nm_vec[nm_vec %in% names(orig)]) 

But got "All arguments must be named" error.

Rana
  • 49
  • 6

2 Answers2

0

See if this helps.

I've adapted your data.


library(dplyr)
library(tibble)

orig <- tribble(
~item,     ~protein,    ~carbohydrates,    ~total_fat,       ~energy,    ~zinc,
1,        5.4,        10.6,             7.3,             90,        3.4,  
2,        10.3,       11.6,             3.3,             10,        2.1, 
3,        8.4,        10.6,             2.3,             52,        0.2,
4,        2.7,        8.6,              20.3,            356,       1.3)


csv_nm <- tribble(
  ~new_name,   ~old_name,
  "nut203",    "protein",
  "nut204",    "total_fat",
  "nut205",    "carbohydrates",
  "nut208",    "food_energy",
  "nut303",    "iron")

# create a named vector to use with dplyr::rename

nm_vec <- csv_nm$old_name
names(nm_vec) <- csv_nm$new_name

# rename, subsetting the named vector to exclude names which are not present in the dataframe 

tib_new_names <- 
  orig %>% 
  rename(nm_vec[nm_vec %in% names(orig)])


Which gives you:

enter image description here

Peter
  • 11,500
  • 5
  • 21
  • 31
  • Thank you, but I'm getting an error: Error: All arguments must be named – Rana Apr 18 '20 at 17:45
  • Can you add your code in the question? Have you included the subsetting argument in the rename function? – Peter Apr 18 '20 at 18:08
  • I tried to re-edit the snipped code and hopefully, my problem is a bit more clear. regarding the argument error, I got it when I try to run the exact same code you gave here (just copy pasted without changes). – Rana Apr 18 '20 at 18:18
  • Your original variable 'energy' becomes 'nut208' but this is not mapped correctly as your old name in the mapping data table is 'food_energy' ? – Peter Apr 20 '20 at 13:21
  • Rana, could you please paste your code into the question including a 'proper' data frame. Then I can look at it. I suspect that your named vector is not including some names. You may find it worthwhile looking at [mre] to help with setting out a question so that it is helpful for those answering. – Peter Apr 22 '20 at 14:20
0

Since I still encounter similar problems more than a year after my origin question, here is another solution that worked for me, based on the code suggested by @JoelKuiper here: https://stackoverflow.com/a/36010381/13076064

existing_old_names <- match(csv_nm$old_name,names(orig))

names(orig)[na.omit(existing_old_names)] <- csv_nm$new_name[which(!is.na(existing_old_names))]
Rana
  • 49
  • 6