0

I have a data set containing two columns:

Quantity    SKU
1,1         2494008,2493953
1,1,1       2167550,1336380,2365409
3,2,1,6,1   1428608,1137956,2401393,2679310,2579183

End state is a data set that looks like this:

Quantity    SKU
1           2494008
1           2493953
1           2167550
1           1336380
1           2365409
3           1428608
2           1137956
1           2401393
6           2679310
1           2579183

cplit and strsplit work if you want to split on a single variable as shown here, but I need to split on two vars (Quantity and SKU from above).

camille
  • 16,432
  • 18
  • 38
  • 60
Matt
  • 21
  • 7
  • There are lots of dupes for this. Please check `?cSplit` from `splitstackshape` or `separate_rows` from `tidyr` – akrun Jun 28 '17 at 16:56
  • There was a very similar question to this only a few hours ago. – shea Jun 28 '17 at 16:57
  • Nearly a duplicate of https://stackoverflow.com/questions/13773770/split-comma-separated-column-into-separate-rows and https://stackoverflow.com/questions/25401505/splitting-a-string-into-new-rows-in-r – lmo Jun 28 '17 at 17:35
  • 1
    appreciate everyone's feedback. discovered a straightforward solution after diving deeper into the cSplit function. script looks like this: `cSplit(product_report, c('Quantity','SKU'),',',direction = 'long')` – Matt Jun 28 '17 at 21:30

3 Answers3

1

With separate_rows from tidyr:

df <- read.table(text="Quantity    SKU
1,1         2494008,2493953
1,1,1       2167550,1336380,2365409
3,2,1,6,1   1428608,1137956,2401393,2679310,2579183",header=TRUE,stringsAsFactors=FALSE)

library(tidyr)
df %>% separate_rows(Quantity,SKU)

   Quantity     SKU
      <chr>   <chr>
1         1 2494008
2         1 2493953
3         1 2167550
4         1 1336380
5         1 2365409
6         3 1428608
7         2 1137956
8         1 2401393
9         6 2679310
10        1 2579183
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
0
dat <- read.table(text="Quantity    SKU
1,1         2494008,2493953
1,1,1       2167550,1336380,2365409
3,2,1,6,1   1428608,1137956,2401393,2679310,2579183", header=TRUE, stringsAsFactors=FALSE)
dat2<-data.frame(Quantity = unlist(strsplit(dat$Quantity, split=",")), 
             SKU=unlist(strsplit(dat$SKU, split=",")), row.names = NULL)

dat3 <- as.data.frame(do.call(cbind, lapply(dat, function(x) unlist(strsplit(x, ",")))))

# Quantity     SKU
# 1         1 2494008
# 2         1 2493953
# 3         1 2167550
# 4         1 1336380
# 5         1 2365409
# 6         3 1428608
# 7         2 1137956
# 8         1 2401393
# 9         6 2679310
# 10        1 2579183
emilliman5
  • 5,816
  • 3
  • 27
  • 37
  • easier dat3 is `dat3 <- data.frame(lapply(dat, function(x) unlist(strsplit(x, ","))))`. – lmo Jun 28 '17 at 17:32
0

Unsurprisingly, the data.table solution is quite similar to the base R solution proposed by lmo:

library(data.table)
data.table(dat)[, lapply(.SD, function(x) unlist(strsplit(x, ",")))]
    Quantity     SKU
 1:        1 2494008
 2:        1 2493953
 3:        1 2167550
 4:        1 1336380
 5:        1 2365409
 6:        3 1428608
 7:        2 1137956
 8:        1 2401393
 9:        6 2679310
10:        1 2579183

The row number can be kept if required:

data.table(dat)[, rn := .I][, lapply(.SD, function(x) unlist(strsplit(x, ","))), rn]
    rn Quantity     SKU
 1:  1        1 2494008
 2:  1        1 2493953
 3:  2        1 2167550
 4:  2        1 1336380
 5:  2        1 2365409
 6:  3        3 1428608
 7:  3        2 1137956
 8:  3        1 2401393
 9:  3        6 2679310
10:  3        1 2579183

Data

dat <- structure(list(Quantity = c("1,1", "1,1,1", "3,2,1,6,1"), SKU = c("2494008,2493953", 
"2167550,1336380,2365409", "1428608,1137956,2401393,2679310,2579183"
)), .Names = c("Quantity", "SKU"), class = "data.frame", row.names = c(NA, -3L))
Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134