22
sample1 = read.csv("pirate.csv")
sample1[,7] 
[1] >>xyz>>hello>>mate 1
[2] >>xyz>>hello>>mate 2
[3] >>xyz>>mate 3
[4] >>xyz>>mate 4
[5] >>xyz>>hello>>mate 5
[6] >>xyz>>hello>>mate 6

I have to extract and create an array which contains all the words after last >>.

How to do this?

Also, How can I extract (a) o qwerty, (b) mate1 and (c) pirate1 in different variables from the following string

p= '>>xyz- o qwerty>>hello>>mate1>>sole pirate1'

Thanks

Sotos
  • 51,121
  • 6
  • 32
  • 66
Looper
  • 295
  • 2
  • 3
  • 10
  • why is there an `r` tag. do you need it in `r` as well as excel? – Sotos May 05 '16 at 13:02
  • yes, I have to extract a column from an excel file to r in a vector form – Looper May 05 '16 at 13:06
  • 1
    See `read.csv` the arguments header and nrows. This should get you started with reading. There are a lot of questions on SO about this. – lmo May 05 '16 at 13:09
  • Do you want to always extract the last word or just the word after 'ahoy'? – cdeterman May 05 '16 at 13:13
  • it's not the last word or the word after ahoy...my motive is to extract specific words in a string like extracting "stuff data" from the string ">>hello1>>hola1>>ahoy xyz stuff data mate1" – Looper May 05 '16 at 13:19
  • That is why is best to use reproducible example. Have a look at `read.csv` (after you save your excel file as .csv) and use `dput` in `r` to produce an example. Also have a look at [this link](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Sotos May 05 '16 at 13:23

2 Answers2

31
x <- c('>>xyz>>hello>>mate 1', '>>xyz>>hello>>mate 2', '>>xyz>>mate 3', ' >>xyz>>mate 4' ,'>>xyz>>hello>>mate 5')
sub('.*>>', '', x)
#[1] "mate 1" "mate 2" "mate 3" "mate 4" "mate 5"
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • 1
    you are giving each and every row name of column 7 in the x vector. 1) How can I pass the whole column in the vector(as these are only 6 rows but I am dealing with more than 100 rows) 2) I also want to extract text between two specific symbols for eg. >>xyz-qwerty>>hello>>mate1>>pirate1 in above I want to extract qwerty and hello in two separate variables. Please help me with that. Thanks – Looper May 06 '16 at 06:15
  • well, to select the individual column then replace `x` with `sample1$...` or `sample1[,7]`. For extracting other terms you will need to update your question and give some more details about it. – Sotos May 06 '16 at 06:20
  • How about something like: `gsub('.* ', "", unlist(strsplit(p, '>>')))` ? – Sotos May 06 '16 at 07:14
  • it is giving me result like this ["" "" "" "pirate1"] – Looper May 06 '16 at 07:22
  • 1
    It should give you this: `[1] "" "qwerty" "hello" "mate1" "pirate1"` – Sotos May 06 '16 at 07:28
  • can also use `str_replace_all(x, ".*>>","")` – Nick Jul 01 '21 at 23:41
8

Assuming you already read that stuff into an R data frame, you can use stringr package as follows:

library(stringr)
str_extract(df$mystring, '\\S+$')

For example, if you have string like this:

s <- '>>hello1>>hola1>>ahoy mate1'

You get:

str_extract(s, '\\S+$')
[1] "mate1"
Gopala
  • 10,363
  • 7
  • 45
  • 77
  • or simply `sub('.* ', '', s)` but I think his problem has to do with also importing data in R... – Sotos May 05 '16 at 13:11
  • 2
    Yeah. Agree. I lead people to packages that are versatile so they can do more with them as needed for different problems. – Gopala May 05 '16 at 13:13
  • 1
    `all the words after last >>` in your example should be `ahoy mate1` – rawr May 05 '16 at 17:55