0

I have a character Vector called make in my dataframe Output:

make <- c("AUDI", "HUSQVARNA","SYM","LEXMOTO","LDV","APOLLO","AUDI R8 SPYDER QUATTRO V10", "MITSUBISHI FUSO","JEEP GRAND CHEROKEE LIMITED CRD A")

I want to create another vector in my data frame, Output$model that includes those characters after the "AUDI" e.g.

make  model

AUDI  R8 SPYDER QUATTRO V10

I know that I can separate the strings like this:

Output$model <- gsub(".* ", '', output$make)

But how would I do it so it only does this for strings that include "AUDI" Thanks !!

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213

3 Answers3

0

We can use tidyr::extract to divide data into two columns. The first column would have "AUDI" (if present) or the entire string and the second column would have everything after "AUDI" (if present) or blank otherwise.

tidyr::extract(df, make, c('model', 'make'), '(AUDI|.*)\\s?(.*)')

#                              model                  make
#1                              AUDI                      
#2                         HUSQVARNA                      
#3                               SYM                      
#4                           LEXMOTO                      
#5                               LDV                      
#6                            APOLLO                      
#7                              AUDI R8 SPYDER QUATTRO V10
#8                   MITSUBISHI FUSO                      
#9 JEEP GRAND CHEROKEE LIMITED CRD A                      

data

df <- data.frame(make = make)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Awsome thanks so much ! How would you approach this problem if you wanted to extract two different character strings out so like with this example, also extracting "GRAND CHEROKEE LIMITED CRD A" from the model with JEEP. – Dylan Johnson Oct 15 '20 at 13:12
  • @DylanJohnson Please include all the patterns that you want to include once in the question instead of extending them in the comments. – Ronak Shah Oct 15 '20 at 15:14
0

Consider this:

Output$model <- stringr::str_extract(Output$make, "(?<=AUDI ).*")

Output

                               make                 model
1                              AUDI                  <NA>
2                         HUSQVARNA                  <NA>
3                               SYM                  <NA>
4                           LEXMOTO                  <NA>
5                               LDV                  <NA>
6                            APOLLO                  <NA>
7        AUDI R8 SPYDER QUATTRO V10 R8 SPYDER QUATTRO V10
8                   MITSUBISHI FUSO                  <NA>
9 JEEP GRAND CHEROKEE LIMITED CRD A                  <NA>
ekoam
  • 8,744
  • 1
  • 9
  • 22
  • Awsome thanks so much ! How would you approach this problem if you wanted to extract two different character strings out so like with this example, also extracting "GRAND CHEROKEE LIMITED CRD A" from the model with JEEP. – Dylan Johnson 7 mins ago Delete – Dylan Johnson Oct 15 '20 at 13:20
  • Use this pattern `"(?<=(AUDI|JEEP) ).*"` @DylanJohnson – ekoam Oct 16 '20 at 01:36
0

Using base R:

> df <- data.frame(make = c("AUDI", "HUSQVARNA","SYM","LEXMOTO","LDV","APOLLO","AUDI R8 SPYDER QUATTRO V10", "MITSUBISHI FUSO","JEEP GRAND CHEROKEE LIMITED CRD A"))
> df$model <- ''
> df$model[grep('AUDI ',df$make)] <- gsub('AUDI\\s(.*)','\\1', df$make[grep('AUDI ',df$make)])
> df
                               make                 model
1                              AUDI                      
2                         HUSQVARNA                      
3                               SYM                      
4                           LEXMOTO                      
5                               LDV                      
6                            APOLLO                      
7        AUDI R8 SPYDER QUATTRO V10 R8 SPYDER QUATTRO V10
8                   MITSUBISHI FUSO                      
9 JEEP GRAND CHEROKEE LIMITED CRD A                      
> 
Karthik S
  • 11,348
  • 2
  • 11
  • 25
  • Awsome thanks so much ! How would you approach this problem if you wanted to extract two different character strings out so like with this example, also extracting "GRAND CHEROKEE LIMITED CRD A" from the model with JEEP. – Dylan Johnson 7 mins ago Delete – Dylan Johnson Oct 15 '20 at 13:20