Rename rownames

Question

I would like to rename row names by removing common part of a row name

          a b  c
CDA_Part  1 4  4
CDZ_Part  3 4  4
CDX_Part  1 4  4

result

     a b  c
CDA  1 4  4
CDZ  3 4  4
CDX  1 4  4

Are you also asking *how* to identify the common part by R code? — dario, Feb 17 '20 at 15:43

dario · Accepted Answer · 2020-02-17T15:46:21.063

3

1.Create a minimal reproducible example:

df <- data.frame(a = 1:3, b = 4:6)
rownames(df) <- c("CDA_Part", "CDZ_Part", "CDX_Part")

df

Returns:

         a b
CDA_Part 1 4
CDZ_Part 2 5
CDX_Part 3 6

2.Suggested solution using base Rs gsub:

rownames(df) <- gsub("_Part", "", rownames(df), fixed=TRUE)

df

Returns:

    a b
CDA 1 4
CDZ 2 5
CDX 3 6

Explanation:

gsub uses regex to identify and replace parts of strings. The three first arguments are:

pattern the pattern to be replaced - i.e. "_Part"
replacement the string to be used as replacement - i.e. the empty string ""
x the string we want to replace something in - i.e. the rownames

An additional argument (not in the first 3):

fixed indicating if pattern is meant to be a regular expression or "just" an ordinary string - i.e. just a string

edited Feb 17 '20 at 15:46

answered Feb 17 '20 at 15:26

dario

6,415
2
12
26

1

Use `fixed = TRUE` to have better speed and exact string matching – Clemsang Feb 17 '20 at 15:28
Thanks for the comment, I was in the process of editing my answer. – dario Feb 17 '20 at 15:36
2

I don't know if it's really needed here but your answer is missing finding the common part of the rowname instead of typing it – Clemsang Feb 17 '20 at 15:38

PKumar · Answer 2 · 2020-02-18T03:17:22.310

You can try this approach, you can use Reduce with intersect to determine the common parts in the name, Note I am assuming here that you have structure like below in your dataset, where underscore is a separator between two words. This solution will work with both word_commonpart or commonpart_word, like in the example below.

Logic: Using strsplit, split-ted the column basis underscore(not eating underscore as well, so used look around zero width assertions), now using Reduce to find intersection between the strings of all rownames. Those found are then pasted as regex with pipe separated items and replaced by Nothing using gsub.

Input:

structure(list(a = 1:4, b = 4:7), class = "data.frame", row.names = c("CDA_Part", 
"CDZ_Part", "CDX_Part", "Part_ABC"))

Solution:

red <- Reduce('intersect', strsplit(rownames(df),"(?=_)",perl=T)) 
##1. determining the common parts
e <- expand.grid(red, red) 
##2. getting all the combinations of underscores and the remaining parts
rownames(df) <- gsub(paste0(do.call('paste0', e[e$Var1!=e$Var2,]), collapse = "|"), '', rownames(df)) 
##3. filtering only those combinations which are different and pasting together using do.call
##4. using paste0 to get regex seperated by pipe
##5.replacing the common parts with nothing here

Output:

> df
#        a b
#    CDA 1 4
#    CDZ 2 5
#    CDX 3 6
#    ABC 4 7

Rename rownames

2 Answers2