Match column headers and rename to append string with column name

Question

I want to append column headers of dataframe1 with various strings. I have another dataframe2 which contains column names and the strings to append.

How do i append column names in dataframe1?

dataframe1:
id  C1_A C2_A C3_A C4_A C5_A
11   0     0    0    1    2 
12   0     3    2    1    0
13   2     0    0    2    3
14   0     0    2    1    1

dataframe2
C      S
C1_A   HP
C2_A   LP
C3_A   KP
C4_A   KP
C5_A   HP

Desired output dataframe1
id  HP_C1_A LP_C2_A KP_C3_A KP_C4_A HP_C5_A
11     0       0       0       1       2
12     0       3       2       1       0
13     2       0       0       2       3
14     0       0       2       1       1

acylam · Answer 1 · 2017-11-01T19:24:35.743

Here's another solution using str_replace_all from stringr:

library(dplyr)
library(stringr)

df2 %>%
  mutate(S = paste(S, C, sep = "_")) %>%
  {setNames(.$S, .$C)} %>%
  str_replace_all(names(df1), .) %>%
  setNames(df1, .)

Result:

  id HP_C1_A LP_C2_A KP_C3_A KP_C4_A HP_C5_A
1 11       0       0       0       1       2
2 12       0       3       2       1       0
3 13       2       0       0       2       3
4 14       0       0       2       1       1

Data:

df1 = read.table(text = "  id  C1_A C2_A C3_A C4_A C5_A
                 11   0     0    0    1    2 
                 12   0     3    2    1    0
                 13   2     0    0    2    3
                 14   0     0    2    1    1
                 ", header = TRUE, stringsAsFactors = FALSE)

df2 = read.table(text = "C      S
                 C1_A   HP
                 C2_A   LP
                 C3_A   KP
                 C4_A   KP
                 C5_A   HP", header = TRUE, stringsAsFactors = FALSE)

Edit:

@markdly pointed out that one can write the following one-liner instead to get away from dplyr:

names(df1) <- str_replace_all(names(df1), setNames(paste0(df2$S, "_", df2$C), df2$C))

Nice use of the named vector with `str_replace_all` (+1). I hadn't thought of that. If you wanted to do away with the dependency on `dplyr` then you could also do `library(stringr); names(df1) <- str_replace_all(names(df1), setNames(paste0(df2$S, "_", df2$C), df2$C))` — markdly, Nov 01 '17 at 19:15
@markdly This is great! Updated into my answer if you don't mind :) — acylam, Nov 01 '17 at 19:24
Not at all. I was going to add it to mine but it fits better with your existing approach I think — markdly, Nov 01 '17 at 19:26

score 1 · Answer 2 · answered Nov 01 '17 at 17:15

library(dplyr)
library(tidyr)

# example data frames
df1 = read.table(text = "
id  C1_A C2_A C3_A C4_A C5_A
11   0     0    0    1    2 
12   0     3    2    1    0
13   2     0    0    2    3
14   0     0    2    1    1", header = T)

df2 = read.table(text = "
C      S
C1_A   HP
C2_A   LP
C3_A   KP
C4_A   KP
C5_A   HP", header = T, stringsAsFactors = F)


df1 %>%
  gather(C, value, -id) %>%      # reshape and make column names a variable C
  left_join(df2, by = "C") %>%   # so we can join and get the corresponding S values in another column
  unite("S_C", S, C) %>%         # combine values of S and C
  spread(S_C, value)             # reshape back to original form

#   id HP_C1_A HP_C5_A KP_C3_A KP_C4_A LP_C2_A
# 1 11       0       2       0       1       0
# 2 12       0       0       2       1       3
# 3 13       2       3       0       2       0
# 4 14       0       1       2       1       0

Imo, stopping after the left join would give a more useful data structure (albeit not what the OP asked for). — Frank, Nov 01 '17 at 18:59

score 1 · Accepted Answer · answered Nov 01 '17 at 17:52

Another approach which uses match from base R

df1 <- dataframe1
df2 <- dataframe2

nm <- names(df1)
names(df1) <- ifelse(nm %in% df2$C, paste0(df2$S[match(nm, df2$C)], "_", nm), nm) 
df1

#>   id HP_C1_A LP_C2_A KP_C3_A KP_C4_A HP_C5_A
#> 1 11       0       0       0       1       2
#> 2 12       0       3       2       1       0
#> 3 13       2       0       0       2       3
#> 4 14       0       0       2       1       1

score 0 · Answer 4 · answered Nov 01 '17 at 17:19

Very simple solution using base R.

df1 <- data.frame(id = 11:14, C1_A = c(0,0,2,0),
       C2_A = c(0,3,0,0), C4_A = c(1,1,2,1),
       C5_A = c(2,0,3,1))

df2 <- data.frame(col = c('C', 'C1_A', 'C2_A', 'C3_A',
                    'C4_A', 'C5_A'), comp = c('S', 'HP', 'LP',
                                       'KP', 'KP', 'HP'), stringsAsFactors = FALSE)

idx <- df2$col %in% colnames(df1)
idx2 <- colnames(df1) %in% df2$col

colnames(df1)[idx2] <- paste(df2$comp[idx], colnames(df1)[idx2], sep = '_')

Match column headers and rename to append string with column name

4 Answers4

Linked