a data I have is like below
dft<- structure(list(ATM1 = c(0.61048, 0.46609, 0.52073, 0.78661, 0.46614,
0.60211, NA), ATM2 = c(NA, 0.874645, NA, 0.94743, NA, 0.984454,
NA), ATM3 = c(NA, NA, NA, 0.343564, 0.163544, 0.765422, NA)), .Names = c("ATM1",
"ATM2", "ATM3"), row.names = c("A0AV96", "A0FGR8", "2A3N6;O14986;O14617",
"A1L020", "P54792;O14640", "CON__P15497", "Q9H3Y6;CON__H-INV:HIT000016045"
), class = "data.frame")
the row names look like this
A0AV96
A0FGR8
2A3N6;O14986;O14617
A1L020
P54792;O14640
CON__P15497
Q9H3Y6;CON__H-INV:HIT000016045
I want to remove part of any string that has CON__ or is CON__H-INV:HIT000016045
then I want to shift those string after ; as a new row with the same values as they are . for example the output of above should look like this
ATM1 ATM2 ATM3
A0AV96 0.61048 NA NA
A0FGR8 0.46609 0.874645 NA
2A3N6 0.52073 NA NA
O14986 0.52073 NA NA
O14617 0.52073 NA NA
A1L020 0.78661 0.947430 0.343564
P54792 0.46614 NA 0.163544
O14640 0.46614 NA 0.163544
P15497 0.60211 0.984454 0.765422
Q9H3Y6 NA NA NA
as an example, the third row has three strings separated with ; as 2A3N6;O14986;O14617 they should make two new rows with the same as where they are.
The output is like this
temp <- strsplit(gsub("(CON__|CON__H-INV:HIT000016045)", "", rownames(dft)),";")
> # use length of list to "grow" dataframe
> dftNew <- dft[rep(seq_along(temp), sapply(temp, length)), ]
> temp <- unlist(temp)
> temp[duplicated(temp)] <- paste(temp[duplicated(temp)],
+ seq_along(temp[duplicated(temp)]), sep=".")
>
> rownames(dftNew) <- unlist(temp)
> dftNew$id <- rep(seq_along(temp), sapply(temp, length))
> dftNew
ATM1 ATM2 ATM3 id
A0AV96 0.61048 NA NA 1
A0FGR8 0.46609 0.874645 NA 2
2A3N6 0.52073 NA NA 3
O14986 0.52073 NA NA 4
O14617 0.52073 NA NA 5
A1L020 0.78661 0.947430 0.343564 6
P54792 0.46614 NA 0.163544 7
O14640 0.46614 NA 0.163544 8
P15497 0.60211 0.984454 0.765422 9
Q9H3Y6 NA NA NA 10