0

I have a data frame with rownames that contains space separated strings. I would like to grep the last 5 part of the rowname and save it in a new column.

hsa-let-7f-5p TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt

To get part one I do this:

read.table(text=rownames(df))$V1

What I want:

TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt
user2300940
  • 2,355
  • 1
  • 22
  • 35
  • 1
    Relevant posts: [Splitting a dataframe string column into multiple different columns](http://stackoverflow.com/questions/18641951), [Split a column of a data frame to multiple columns](http://stackoverflow.com/questions/4350440). – zx8754 Jun 22 '16 at 07:32

3 Answers3

3

We can either split the string with strsplit, get the last 5 elements with tail and paste it together

 paste(tail(strsplit(str1, "\\s+")[[1]],5), collapse=" ")
 #[1] "TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt"

If we have multiple elements, we loop through the list (output from strsplit) and do the same as above.

 sapply(strsplit(rep(str1,2), " "), function(x) paste(tail(x, 5), collapse=" "))
 #[1] "TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt" "TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt"

Or use str_extract

 library(stringr)
 str_extract(str1, "(\\S+\\s+){4}\\S+$")
 #[1] "TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt"

Part of the same pattern can be used in sub from base R

sub(".*\\s+((\\S+\\s+){4})(\\S+)$", "\\1\\3", str1)
#[1] "TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt"

data

str1 <- "hsa-let-7f-5p TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt"
akrun
  • 874,273
  • 37
  • 540
  • 662
3

We can use word from stringr,

library(stringr)
paste(word(x, -5:-1), collapse = ' ')
#[1] "TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt"
Sotos
  • 51,121
  • 6
  • 32
  • 66
2

You can use this

library(stringr)
library(stringi)
word(V1,stri_count(V1,regex="\\S+")-4,stri_count(V1,regex="\\S+"))

Data

V1<-"hsa-let-7f-5p TGAGGTAGTAGATTGTATAAA 0 I-AA 0 gtt"
user2100721
  • 3,557
  • 2
  • 20
  • 29