I would like to rename row names by removing common part of a row name
a b c
CDA_Part 1 4 4
CDZ_Part 3 4 4
CDX_Part 1 4 4
result
a b c
CDA 1 4 4
CDZ 3 4 4
CDX 1 4 4
I would like to rename row names by removing common part of a row name
a b c
CDA_Part 1 4 4
CDZ_Part 3 4 4
CDX_Part 1 4 4
result
a b c
CDA 1 4 4
CDZ 3 4 4
CDX 1 4 4
1.Create a minimal reproducible example:
df <- data.frame(a = 1:3, b = 4:6)
rownames(df) <- c("CDA_Part", "CDZ_Part", "CDX_Part")
df
Returns:
a b
CDA_Part 1 4
CDZ_Part 2 5
CDX_Part 3 6
2.Suggested solution using base Rs gsub
:
rownames(df) <- gsub("_Part", "", rownames(df), fixed=TRUE)
df
Returns:
a b
CDA 1 4
CDZ 2 5
CDX 3 6
Explanation:
gsub
uses regex
to identify and replace parts of strings. The three first arguments are:
pattern
the pattern to be replaced - i.e. "_Part"replacement
the string to be used as replacement - i.e. the empty string ""x
the string we want to replace something in - i.e. the rownamesAn additional argument (not in the first 3):
fixed
indicating if pattern
is meant to be a regular expression or "just" an ordinary string - i.e. just a stringYou can try this approach, you can use Reduce with intersect to determine the common parts in the name, Note I am assuming here that you have structure like below in your dataset, where underscore is a separator between two words. This solution will work with both word_commonpart
or commonpart_word
, like in the example below.
Logic: Using strsplit, split-ted the column basis underscore(not eating underscore as well, so used look around zero width assertions), now using Reduce to find intersection between the strings of all rownames. Those found are then pasted as regex with pipe separated items and replaced by Nothing using gsub.
Input:
structure(list(a = 1:4, b = 4:7), class = "data.frame", row.names = c("CDA_Part",
"CDZ_Part", "CDX_Part", "Part_ABC"))
Solution:
red <- Reduce('intersect', strsplit(rownames(df),"(?=_)",perl=T))
##1. determining the common parts
e <- expand.grid(red, red)
##2. getting all the combinations of underscores and the remaining parts
rownames(df) <- gsub(paste0(do.call('paste0', e[e$Var1!=e$Var2,]), collapse = "|"), '', rownames(df))
##3. filtering only those combinations which are different and pasting together using do.call
##4. using paste0 to get regex seperated by pipe
##5.replacing the common parts with nothing here
Output:
> df
# a b
# CDA 1 4
# CDZ 2 5
# CDX 3 6
# ABC 4 7