select multiple characters before one same character (R)

Question

My example as below:

df <- data.frame(x = c("Santiria laevigata Blume f. laevigata", 
                 "Santiria laevigata", 
                 "Santiria laevigata Blume f. glabrifolia (Engl.) H.J.Lam"))

                                                        x
1                   Santiria laevigata Blume f. laevigata
2                                      Santiria laevigata
3 Santiria laevigata Blume f. glabrifolia (Engl.) H.J.Lam

I would like to get only Santiria laevigata by using string to say that I will keep every letters before Blume or in other words, I gonna remove all characters starting from Blume. Any suggestions for me?

Desired output

                                     x                  
1                   Santiria laevigata  
2                   Santiria laevigata
3                   Santiria laevigata

Suggested duplicates: [remove/replace specific words or phrases from character strings - R](https://stackoverflow.com/questions/41883436/remove-replace-specific-words-or-phrases-from-character-strings-r) and [Replace specific characters within strings](https://stackoverflow.com/questions/11936339/replace-specific-characters-within-strings) — Mata, Oct 06 '21 at 07:52
Does this answer your question? [remove/replace specific words or phrases from character strings - R](https://stackoverflow.com/questions/41883436/remove-replace-specific-words-or-phrases-from-character-strings-r) — Mata, Oct 06 '21 at 07:53

score 1 · Accepted Answer · answered Oct 06 '21 at 07:19

1

You can use sub to remove everything from Blume.*.

df$y <- trimws(sub('Blume.*', '', df$x))
df$y
#[1] "Santiria laevigata" "Santiria laevigata" "Santiria laevigata"

answered Oct 06 '21 at 07:19

Ronak Shah

377,200
20
156
213

`.*` means that everthing behind Blume? – Anh Oct 06 '21 at 07:21
thank you it worked well when I apply into `str_remove` function – Anh Oct 06 '21 at 07:22
Since we are using `'Blume.*'` it means remove everything after Blume. You may also use the same in `str_remove`. `str_remove(df$x, 'Blume.*')` – Ronak Shah Oct 06 '21 at 07:23

score 1 · Answer 2 · answered Oct 06 '21 at 07:21

1

Simply using gsub

df$x <- gsub("Blume.+", "", df$x)

                    x
1 Santiria laevigata 
2  Santiria laevigata
3 Santiria laevigata

answered Oct 06 '21 at 07:21

Park

14,771
6
10
29

thank you! It worded well. – Anh Oct 06 '21 at 21:07

score 1 · Answer 3 · edited Oct 06 '21 at 07:51

1

you could try changing the df to

df <-  c("Santiria laevigata Blume f. laevigata", 
             "Santiria laevigata", 
             "Santiria laevigata Blume f. glabrifolia (Engl.)    H.J.Lam"))

and then entering as follows

new_df <- substr(df,1,18)
new_df

[1] "Santiria laevigata" "Santiria laevigata" "Santiria laevigata"

I don't know how to make it work with

data.frame(x = c("abc"))

edited Oct 06 '21 at 07:51

Dharman

30,962
25
85
135

answered Oct 06 '21 at 07:45

Severi Suttinen

31
3

score 1 · Answer 4 · answered Oct 06 '21 at 16:34

1

We may use word

library(stringr)
word(df$x, 1, 2)
[1] "Santiria laevigata" "Santiria laevigata" "Santiria laevigata"

answered Oct 06 '21 at 16:34

akrun

874,273
37
540
662

I was using `word`. However, there is only onething that i changed my mind. my data has a lot of species with bad resolution (e.g. `genus HR1` , or `genus G H 1`). Besides, some species having only one characters. So, i dont think `word` is a good choice here – Anh Oct 06 '21 at 21:06
@Anh Yes, you are right. I was thinking that you had only similar cases as in the example – akrun Oct 06 '21 at 21:11

select multiple characters before one same character (R)

4 Answers4