I have obtained a data set with slightly inconsistent and messy variable names. I would like to rename them in a an efficient and automated way.
I have a set of data frames and I need to rename some columns in several of them. The order of the columns, and length of the data frames differ, so I would like to use any function such as grep()
or a subset term (df$x[== "term"]
).
I have found an older question regarding this problem (Rename columns in multiple dataframes, R), but I haven't been able to get any of the suggested solutions to work since I get an error message. I do not have reputation enough to comment and ask further questions on those replies. However, my problem seems to be a bit different as I get an error message from my for loop that is not mentioned in the earlier question:
Error in `colnames<-`(`*tmp*`, value = character(0)) :
attempt to set 'colnames' on an object with less than two dimensions
Setup: multiple data frames, let's call them myDF1
, myDF2
...
In those data frames there are columns with names (bad_name1
, bad_name2
) that should be changed to something else (good_name1
, good_name2
).
Replicable dataset:
myDF1 <- data.frame(bad_name1="A", bad_name2="B")
myDF2 <- data.frame(bad_name1="C", bad_name2="D")
for (x in c(myDF1,myDF2)) {
colnames(x) <- gsub(x = colnames(x), pattern = "bad_name0", replacement = "good_name1")
}
There are several ways of doing this. One that appealed to me is the subset method:
colnames(myDF1)[names(myDF1) == "bad_name1"] <- "good_name1")
This works fine as a single line, but not as a for loop.
for (x in c(myDF1,myDF2)) {
colnames(x)[colnames(x) == "bad_name1"] <- "good_name1"
}
Which renders the error message.
Error in `colnames<-`(`*tmp*`, value = character(0)) :
attempt to set 'colnames' on an object with less than two dimensions
The same error message applies with a 'gsub'-based method:
for (x in c(myDF1,myDF2)) {
colnames(x) <- gsub(x = colnames(x), pattern = "bad_name1", replacement = "good_name1")
}
I realise that I miss out on something fundamental here. I suppose that the for loop is not receiving the results of the 'colnames(x)' in an appropriate format. But I cannot understand how I'm supposed to make it work. The methods suggested in Rename columns in multiple dataframes, R does not really cover this error message.
Additional clarification, as asked for by vaettchen in a comment:
There is 3 column names that have to be changed (in all data frames). The reason is that they have names like varX.1
, varX.2
, varX.3
while I would prefer varXcount
, varXmean
, varXmax
. So I have realised that there are names that I am not happy with, and decided on new ones based on my own taste.