Extract text after a symbol in R

Question

sample1 = read.csv("pirate.csv")
sample1[,7] 
[1] >>xyz>>hello>>mate 1
[2] >>xyz>>hello>>mate 2
[3] >>xyz>>mate 3
[4] >>xyz>>mate 4
[5] >>xyz>>hello>>mate 5
[6] >>xyz>>hello>>mate 6

I have to extract and create an array which contains all the words after last >>.

How to do this?

Also, How can I extract (a) o qwerty, (b) mate1 and (c) pirate1 in different variables from the following string

p= '>>xyz- o qwerty>>hello>>mate1>>sole pirate1'

Thanks

why is there an `r` tag. do you need it in `r` as well as excel? — Sotos, May 05 '16 at 13:02
yes, I have to extract a column from an excel file to r in a vector form — Looper, May 05 '16 at 13:06
See `read.csv` the arguments header and nrows. This should get you started with reading. There are a lot of questions on SO about this. — lmo, May 05 '16 at 13:09
Do you want to always extract the last word or just the word after 'ahoy'? — cdeterman, May 05 '16 at 13:13
it's not the last word or the word after ahoy...my motive is to extract specific words in a string like extracting "stuff data" from the string ">>hello1>>hola1>>ahoy xyz stuff data mate1" — Looper, May 05 '16 at 13:19
That is why is best to use reproducible example. Have a look at `read.csv` (after you save your excel file as .csv) and use `dput` in `r` to produce an example. Also have a look at [this link](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — Sotos, May 05 '16 at 13:23

score 31 · Accepted Answer · answered May 05 '16 at 13:59

31

x <- c('>>xyz>>hello>>mate 1', '>>xyz>>hello>>mate 2', '>>xyz>>mate 3', ' >>xyz>>mate 4' ,'>>xyz>>hello>>mate 5')
sub('.*>>', '', x)
#[1] "mate 1" "mate 2" "mate 3" "mate 4" "mate 5"

answered May 05 '16 at 13:59

Sotos

51,121
6
32
66

1

you are giving each and every row name of column 7 in the x vector. 1) How can I pass the whole column in the vector(as these are only 6 rows but I am dealing with more than 100 rows) 2) I also want to extract text between two specific symbols for eg. >>xyz-qwerty>>hello>>mate1>>pirate1 in above I want to extract qwerty and hello in two separate variables. Please help me with that. Thanks – Looper May 06 '16 at 06:15
well, to select the individual column then replace `x` with `sample1$...` or `sample1[,7]`. For extracting other terms you will need to update your question and give some more details about it. – Sotos May 06 '16 at 06:20
How about something like: `gsub('.* ', "", unlist(strsplit(p, '>>')))` ? – Sotos May 06 '16 at 07:14
it is giving me result like this ["" "" "" "pirate1"] – Looper May 06 '16 at 07:22
1

It should give you this: `[1] "" "qwerty" "hello" "mate1" "pirate1"` – Sotos May 06 '16 at 07:28
can also use `str_replace_all(x, ".*>>","")` – Nick Jul 01 '21 at 23:41

score 8 · Answer 2 · answered May 05 '16 at 13:10

8

Assuming you already read that stuff into an R data frame, you can use stringr package as follows:

library(stringr)
str_extract(df$mystring, '\\S+$')

For example, if you have string like this:

s <- '>>hello1>>hola1>>ahoy mate1'

You get:

str_extract(s, '\\S+$')
[1] "mate1"

answered May 05 '16 at 13:10

Gopala

10,363
7
45
77

or simply `sub('.* ', '', s)` but I think his problem has to do with also importing data in R... – Sotos May 05 '16 at 13:11
2

Yeah. Agree. I lead people to packages that are versatile so they can do more with them as needed for different problems. – Gopala May 05 '16 at 13:13
1

`all the words after last >>` in your example should be `ahoy mate1` – rawr May 05 '16 at 17:55

Extract text after a symbol in R

2 Answers2

Linked