0

I have obtained a data set with slightly inconsistent and messy variable names. I would like to rename them in a an efficient and automated way.

I have a set of data frames and I need to rename some columns in several of them. The order of the columns, and length of the data frames differ, so I would like to use any function such as grep() or a subset term (df$x[== "term"]).

I have found an older question regarding this problem (Rename columns in multiple dataframes, R), but I haven't been able to get any of the suggested solutions to work since I get an error message. I do not have reputation enough to comment and ask further questions on those replies. However, my problem seems to be a bit different as I get an error message from my for loop that is not mentioned in the earlier question:

Error in `colnames<-`(`*tmp*`, value = character(0)) : 
  attempt to set 'colnames' on an object with less than two dimensions

Setup: multiple data frames, let's call them myDF1, myDF2 ... In those data frames there are columns with names (bad_name1, bad_name2) that should be changed to something else (good_name1, good_name2).

Replicable dataset:

myDF1 <- data.frame(bad_name1="A", bad_name2="B")
myDF2 <- data.frame(bad_name1="C", bad_name2="D")

for (x in c(myDF1,myDF2)) {
colnames(x) <- gsub(x = colnames(x), pattern = "bad_name0", replacement = "good_name1")
}

There are several ways of doing this. One that appealed to me is the subset method:

colnames(myDF1)[names(myDF1) == "bad_name1"] <- "good_name1")

This works fine as a single line, but not as a for loop.

for (x in c(myDF1,myDF2)) {
colnames(x)[colnames(x) == "bad_name1"] <- "good_name1"
}

Which renders the error message.

Error in `colnames<-`(`*tmp*`, value = character(0)) : 
  attempt to set 'colnames' on an object with less than two dimensions

The same error message applies with a 'gsub'-based method:

for (x in c(myDF1,myDF2)) {
colnames(x) <- gsub(x = colnames(x), pattern = "bad_name1", replacement = "good_name1")
}

I realise that I miss out on something fundamental here. I suppose that the for loop is not receiving the results of the 'colnames(x)' in an appropriate format. But I cannot understand how I'm supposed to make it work. The methods suggested in Rename columns in multiple dataframes, R does not really cover this error message.

Additional clarification, as asked for by vaettchen in a comment:
There is 3 column names that have to be changed (in all data frames). The reason is that they have names like varX.1, varX.2, varX.3 while I would prefer varXcount, varXmean, varXmax. So I have realised that there are names that I am not happy with, and decided on new ones based on my own taste.

Smerla
  • 174
  • 9
  • 1
    There are too many details missing in your question. Where are the names of the data.frames you want to change? A vector, a list, are they in CSV files? The bad names and the good names, how are they known, where do they come from? Only if this is clear, one can think about a generalised approach. – vaettchen Jan 29 '18 at 17:36
  • I provided some additional information in the end of the post. However, dcarlson provided a solution that I'm satisfied with. – Smerla Jan 30 '18 at 08:02

1 Answers1

1

You just need a few minor changes. Look at c(myDF1, myDF2) to see why that is not working - it splits the data frames into a list of 4 factors. Combine the data frames into a list and process the list:

all <- list(myDF1=myDF1, myDF2=myDF2)
for (x in seq_along(all)) {
colnames(all[[x]]) <- gsub(x = colnames(all[[x]]), pattern = "bad_name1",
    replacement = "good_name1")
}
list2env(all, envir=.GlobalEnv)
dcarlson
  • 10,936
  • 2
  • 15
  • 18
  • Thank you for the useful reply, I was able to solve my problem. I feel a bit ashamed that I did not check the output of `c(myDF1, myDF2)`. – Smerla Jan 30 '18 at 10:17