4

Maybe what I want to do it's impossible, so I made this question to know if there is a way or not. After reading this question here in stackoverflow I saw that there is a way to split a column in different columns, but it's not what I wanted, I have an app in shiny where I can have tables with values like:

Phones                             price
Nokia 1234D - J298 6732 - LM 2       103$
Samsung 3342L - J2YY 4372 - YU 3     130$
Samsung 3042X - IKAA 3221 - GN 4    102$

So the user comes and says I want to divide those values in column Phones as I want, so the idea that came to my mind was to make the user write something like (" ", " - ", " ", " - ") because I mean separate nokia, 1234D, J298, 6732, LM 2 in 5 columns given the mentioned separators.

Here is the example code:

library(stringr)
c=c(" "," - "," "," - ")
mytable <-data.table(Phones=c("Nokia 1234D - J298 6732 - LM 2",      
                                      "Samsung 3342L - J2YY 4372 - YU 3",
                                      "Samsung 3042X - IKAA 3221 - GN 4"),price= c("103$", "130$", "102$") )
 aux = str_split_fixed(mytable$Phones, c, 5)
  mytable<-data.table( aux, mytable$price)

But I get the following result which is not what I want it separates as it wants , duplicates the first row.:

              V1        V2   V3   V4          V5   V2
1:         Nokia     1234D    - J298 6732 - LM 2 103$
2: Samsung 3342L J2YY 4372 YU 3                  130$
3:       Samsung     3042X    - IKAA 3221 - GN 4 102$
4:   Nokia 1234D J298 6732 LM 2                  103$     

If you have a better solution it would be really helpful.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Programmer Man
  • 1,314
  • 1
  • 9
  • 29

1 Answers1

3

We could separate the 'Phones' column into 5 columns with extra= "merge" to keep the last column with strings "LM 2", "YU 3" etc

library(tidyr)
library(dplyr)
mytable %>% 
  separate(Phones, into = paste0("V", 1:5), remove = FALSE, extra = "merge")
#                             Phones      V1    V2   V3   V4   V5 price
#1:   Nokia 1234D - J298 6732 - LM 2   Nokia 1234D J298 6732 LM 2  103$
#2: Samsung 3342L - J2YY 4372 - YU 3 Samsung 3342L J2YY 4372 YU 3  130$
#3: Samsung 3042X - IKAA 3221 - GN 4 Samsung 3042X IKAA 3221 GN 4  102$

If we need a custom split, then use extract

mytable %>%
   extract(Phones, into = paste0("V", 1:4), remove = FALSE, 
     "^(\\w+\\s+\\w+)\\s*-\\s*(\\w+)\\s+(\\w+)\\s*-\\s*(\\w+\\s+\\w+)")
#                             Phones            V1   V2   V3   V4 price
#1:   Nokia 1234D - J298 6732 - LM 2   Nokia 1234D J298 6732 LM 2  103$
#2: Samsung 3342L - J2YY 4372 - YU 3 Samsung 3342L J2YY 4372 YU 3  130$
#3: Samsung 3042X - IKAA 3221 - GN 4 Samsung 3042X IKAA 3221 GN 4  102$

The ^ implies the start of the string followed by word (\\w+) followed by one or more spaces (\\s+) and the next word (\\w+) which we capture as a group ((...)) similarly the characters are matched based on the word, space characters

NOTE: The first approach gives the expected output as described in the post and the second one from the one requested in the comments


If we need to split by the custom splits in "c"

library(stringr)
c <- c(" "," - "," "," - ") #it is better to avoid function names for object names
fsplit <- function(str1, splt) {
       lst <- str_split(str1, splt, n = 2)
       v1 <- sapply(lst, `[`, 1)
       v2 <- sapply(lst, `[`, 2)
       list(v1, v2)
    }  

mytable[, V5 := Phones]
nm1 <- paste0("V", seq_along(c))
for(i in seq_along(c)){
  tmp <- fsplit(mytable$V5, c[i])
  mytable[, (nm1[i]) := tmp[[1]]]
  mytable[, V5 := tmp[[2]]][]
}
setcolorder(mytable,  c("Phones", nm1, "V5", "price"))
mytable
#                             Phones      V1    V2   V3   V4   V5 price
#1:   Nokia 1234D - J298 6732 - LM 2   Nokia 1234D J298 6732 LM 2  103$
#2: Samsung 3342L - J2YY 4372 - YU 3 Samsung 3342L J2YY 4372 YU 3  130$
#3: Samsung 3042X - IKAA 3221 - GN 4 Samsung 3042X IKAA 3221 GN 4  102$
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 2
    This works but what if the user want to keep `Nokia 1234D` together ? – Programmer Man Aug 23 '17 at 09:37
  • 2
    @David Arenburg thanks for comment that is exactly what I wanted to say!! – Programmer Man Aug 23 '17 at 09:38
  • @ProgrammerMan Then you may need `extract` i.e. `mytable %>% extract(Phones, into = paste0("V", 1:4), remove = FALSE, "^(\\w+\\s+\\w+)\\s*-\\s*(\\w+)\\s+(\\w+)\\s*-\\s*(\\w+)")` – akrun Aug 23 '17 at 09:39
  • 1
    Thanks @akrun but can you explain what is exactly doing this code `^(\\w+\\s+\\w+)\\s*-\\s*(\\w+)\\s+(\\w+)\\s*-\\s*(\\w+)` and in column V4 it should be LM 2 ... it deleted the numbers. – Programmer Man Aug 23 '17 at 09:44
  • @ProgrammerMan If you want the numbers, I can keep it. I thought you want only up til LM – akrun Aug 23 '17 at 09:51
  • 1
    @ProgrammerMan I updated with the custom splits. Sorry, I thought to go for the easy way instead of custom splits – akrun Aug 23 '17 at 10:15
  • 2
    Thank you!! that works for different given inputs! @akrun – Programmer Man Aug 23 '17 at 10:22
  • @DavidArenburg, zx8754, sorry guys, I didn't read the custom split part. corrected – akrun Aug 23 '17 at 11:47