0

My example as below:

df <- data.frame(x = c("Santiria laevigata Blume f. laevigata", 
                 "Santiria laevigata", 
                 "Santiria laevigata Blume f. glabrifolia (Engl.) H.J.Lam"))

                                                        x
1                   Santiria laevigata Blume f. laevigata
2                                      Santiria laevigata
3 Santiria laevigata Blume f. glabrifolia (Engl.) H.J.Lam

I would like to get only Santiria laevigata by using string to say that I will keep every letters before Blume or in other words, I gonna remove all characters starting from Blume. Any suggestions for me?

Desired output

                                     x                  
1                   Santiria laevigata  
2                   Santiria laevigata
3                   Santiria laevigata 
Anh
  • 735
  • 2
  • 11
  • Suggested duplicates: [remove/replace specific words or phrases from character strings - R](https://stackoverflow.com/questions/41883436/remove-replace-specific-words-or-phrases-from-character-strings-r) and [Replace specific characters within strings](https://stackoverflow.com/questions/11936339/replace-specific-characters-within-strings) – Mata Oct 06 '21 at 07:52
  • Does this answer your question? [remove/replace specific words or phrases from character strings - R](https://stackoverflow.com/questions/41883436/remove-replace-specific-words-or-phrases-from-character-strings-r) – Mata Oct 06 '21 at 07:53
  • not really my specific thing that I would like to ask – Anh Oct 06 '21 at 08:51
  • 1
    @Anh Can you check the simple solution i posted. thanks – akrun Oct 06 '21 at 18:55

4 Answers4

1

You can use sub to remove everything from Blume.*.

df$y <- trimws(sub('Blume.*', '', df$x))
df$y
#[1] "Santiria laevigata" "Santiria laevigata" "Santiria laevigata"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

Simply using gsub

df$x <- gsub("Blume.+", "", df$x)

                    x
1 Santiria laevigata 
2  Santiria laevigata
3 Santiria laevigata 
Park
  • 14,771
  • 6
  • 10
  • 29
1

you could try changing the df to

df <-  c("Santiria laevigata Blume f. laevigata", 
             "Santiria laevigata", 
             "Santiria laevigata Blume f. glabrifolia (Engl.)    H.J.Lam"))

and then entering as follows

new_df <- substr(df,1,18)
new_df

[1] "Santiria laevigata" "Santiria laevigata" "Santiria laevigata"

I don't know how to make it work with

data.frame(x = c("abc"))
Dharman
  • 30,962
  • 25
  • 85
  • 135
1

We may use word

library(stringr)
word(df$x, 1, 2)
[1] "Santiria laevigata" "Santiria laevigata" "Santiria laevigata"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I was using `word`. However, there is only onething that i changed my mind. my data has a lot of species with bad resolution (e.g. `genus HR1` , or `genus G H 1`). Besides, some species having only one characters. So, i dont think `word` is a good choice here – Anh Oct 06 '21 at 21:06
  • @Anh Yes, you are right. I was thinking that you had only similar cases as in the example – akrun Oct 06 '21 at 21:11