1

I have the following data frame:

Gene <- c(1,2,3,4,5,6)
A1.1 <- c(1,1,2,4,3,5)
B1.1 <- c(2,1,4,2,4,5)
C1.1 <- c(2,4,3,2,1,5)
A1.2 <- c(1,1,2,3,4,5)
B1.2 <- c(2,2,3,4,5,1)
C1.2 <- c(3,3,2,1,4,5)

df <- data.frame(Gene, A1.1, B1.1, C1.1, A1.2, B1.2, C1.2)
df

  Gene A1.1 B1.1 C1.1 A1.2 B1.2 C1.2
1    1    1    2    2    1    2    3
2    2    1    1    4    1    2    3
3    3    2    4    3    2    3    2
4    4    4    2    2    3    4    1
5    5    3    4    1    4    5    4
6    6    5    5    5    5    1    5

How can I remove the ".1" and ".2" (or the 3rd and 4th character) from each column header? (ex, A1.1 -> A1 or A1.2 -> A1). Could I use gsub()?

Dswede43
  • 351
  • 1
  • 8
  • Basically a duplicate of: [Remove suffix from variable names in data frame](https://stackoverflow.com/questions/61205634/r-remove-suffix-from-variable-names-in-data-frame) or [Removing suffixes from variable names inside a list in R](https://stackoverflow.com/questions/58423294/removing-suffixes-from-variable-names-inside-a-list-in-r) – Ian Campbell Jun 14 '21 at 17:32
  • How will you deal with having multiple columns with the same name? – camille Jun 14 '21 at 18:20
  • Also see https://stackoverflow.com/q/37800704/5325862, https://stackoverflow.com/q/34615460/5325862, https://stackoverflow.com/q/45960269/5325862 – camille Jun 14 '21 at 18:41

1 Answers1

5

We can use sub to match the . (metacharacter - so escape) followed by one or more digits (\\d+) at the end ($) of the string and replace with blank ("")

names(df) <- sub("\\.\\d+$", "", names(df))

NOTE: If the data is data.frame, duplicate column names are not allowed and is not recommended

akrun
  • 874,273
  • 37
  • 540
  • 662