1

I would like to rename row names by removing common part of a row name

          a b  c
CDA_Part  1 4  4
CDZ_Part  3 4  4
CDX_Part  1 4  4

result

     a b  c
CDA  1 4  4
CDZ  3 4  4
CDX  1 4  4
aynber
  • 22,380
  • 8
  • 50
  • 63
giegie
  • 463
  • 4
  • 11

2 Answers2

3

1.Create a minimal reproducible example:

df <- data.frame(a = 1:3, b = 4:6)
rownames(df) <- c("CDA_Part", "CDZ_Part", "CDX_Part")

df

Returns:

         a b
CDA_Part 1 4
CDZ_Part 2 5
CDX_Part 3 6

2.Suggested solution using base Rs gsub:

rownames(df) <- gsub("_Part", "", rownames(df), fixed=TRUE)

df

Returns:

    a b
CDA 1 4
CDZ 2 5
CDX 3 6

Explanation:

gsub uses regex to identify and replace parts of strings. The three first arguments are:

  • pattern the pattern to be replaced - i.e. "_Part"
  • replacement the string to be used as replacement - i.e. the empty string ""
  • x the string we want to replace something in - i.e. the rownames

An additional argument (not in the first 3):

  • fixed indicating if pattern is meant to be a regular expression or "just" an ordinary string - i.e. just a string
dario
  • 6,415
  • 2
  • 12
  • 26
0

You can try this approach, you can use Reduce with intersect to determine the common parts in the name, Note I am assuming here that you have structure like below in your dataset, where underscore is a separator between two words. This solution will work with both word_commonpart or commonpart_word, like in the example below.

Logic: Using strsplit, split-ted the column basis underscore(not eating underscore as well, so used look around zero width assertions), now using Reduce to find intersection between the strings of all rownames. Those found are then pasted as regex with pipe separated items and replaced by Nothing using gsub.

Input:

structure(list(a = 1:4, b = 4:7), class = "data.frame", row.names = c("CDA_Part", 
"CDZ_Part", "CDX_Part", "Part_ABC"))

Solution:

red <- Reduce('intersect', strsplit(rownames(df),"(?=_)",perl=T)) 
##1. determining the common parts
e <- expand.grid(red, red) 
##2. getting all the combinations of underscores and the remaining parts
rownames(df) <- gsub(paste0(do.call('paste0', e[e$Var1!=e$Var2,]), collapse = "|"), '', rownames(df)) 
##3. filtering only those combinations which are different and pasting together using do.call
##4. using paste0 to get regex seperated by pipe
##5.replacing the common parts with nothing here

Output:

> df
#        a b
#    CDA 1 4
#    CDZ 2 5
#    CDX 3 6
#    ABC 4 7
PKumar
  • 10,971
  • 6
  • 37
  • 52