One way would be to replace the second underscore by another delimiter (i.e. space) using sub
and then split using that.
Using sub
, we match one or more characters that are not a _
from the beginning (^
) of the string (^[^_]+
) followed by the first underscore (_
) followed by one or characters that are not a _
([^_]+
). We capture that as a group by placing it inside the parentheses ((....)
), then we match the _
followed by one or more characters till the end of the string in the second capture group ((.*)$
). In the replacement, we separate the first (\\1
) and second (\\2
) with a space.
strsplit(sub('(^[^_]+_[^_]+)_(.*)$', '\\1 \\2', v1), ' ')
#[[1]]
#[1] "c54254_g4545" "i5454"
#[[2]]
#[1] "c434_g4" "i455"
#[[3]]
#[1] "c5454_g544" "i3"
data
v1 <- c('c54254_g4545_i5454', 'c434_g4_i455', 'c5454_g544_i3')