0

I have a group of datasets, from a survey applied to many different countries, which I want to combine to create a single merged data.frame. Unfortunately, for one of them , the variable names is different from the others, but it follows a pattern: as in the others the names of the variables are like "VAR1", "VAR2", etc., in this one their names are "VAR_a", "VAR_b", etc.

The code I've used so far to solve this problem is something like:

names (df) <- gsub("_a", "01", names(df)) 
names (df) <- gsub("_b", "02", names(df)) 
names (df) <- gsub("_c", "03", names(df)) 
names (df) <- gsub("_d", "04", names(df)) 
names (df) <- gsub("_e", "05", names(df)) 
names (df) <- gsub("_f", "06", names(df)) 
names (df) <- gsub("_g", "07", names(df)) 

up to the 14th letter/ number (no variable goes further than that), so that it can become similar to the other data.frames.

I know there should be a way of doing that with a few or maybe even one single line of code, but I can't find a way to do an iteration or any argument inside gsub itself to do this. Can anyone help me?

I was thinking maybe about something like:

names (df) <- gsub ("_[a-z]", "[1-9]", names(df))

But this didn't work, of course. I need R to understand I want each letter to become the corresponding number ("_a" becomes 1, etc.)

Appreciate any help.

  • I think this is one of those occasions when a loop makes sense. Pretty much just this - https://stackoverflow.com/a/26171700/496803 – thelatemail Jul 21 '17 at 00:50

1 Answers1

1

If you just want a version of gsub that vertorises over pattern and replacement, stringr has one called str_replace. The below code also uses letters in any version of R.

library(stringr)
df <- data.frame(matrix(0, nrow = 5, ncol = 10))
colnames(df) <- paste0("abcd2345p_", letters[1:10])
colnames(df)

> [1] "abcd2345p_a" "abcd2345p_b" "abcd2345p_c" "abcd2345p_d" "abcd2345p_e"
[6] "abcd2345p_f" "abcd2345p_g" "abcd2345p_h" "abcd2345p_i" "abcd2345p_j"

str_replace(colnames(df), paste0("_", letters[1:ncol(df)], "$"), as.character(1:ncol(df)))

>  [1] "abcd2345p1"  "abcd2345p2"  "abcd2345p3"  "abcd2345p4"  "abcd2345p5" 
[6] "abcd2345p6"  "abcd2345p7"  "abcd2345p8"  "abcd2345p9"  "abcd2345p10"
raymkchow
  • 929
  • 11
  • 20
  • It worked very well for some of the variables, but not for the ones where there was a number before "_" (like "PRO1_a"). A dirty solution would be to change for some letter before transforming everything with the code you provided, but there might be some cleanest way to do that. – Guilherme Pires Arbache Jul 21 '17 at 20:58
  • @GuilhermePiresArbache Oh this is not hard. What you need to do is to modify the regex (add $ at the end) to detect the end of character string. See the updated answer. – raymkchow Jul 22 '17 at 00:22
  • Sorry, I was wrong, it hasn't worked for any variables (the ones I thought has changed were already in numbers). And it still doesn't work. If it helps, the actual names of the variables are like: "ROES1_a" "ROES1_b", etc. then : "PRO1_a" "PRO1_b", etc. and others like that. Appreciate the effort. – Guilherme Pires Arbache Jul 23 '17 at 23:14