1
Type  Network  Show  Placement  Cost     Dates (chr)
VVVV  AJSS     XGAF  BHGHF      103.00   3/21,3/23

I have data that looks like the above data frame snippet. I need to look at the Dates column and depending on the number of dates designated in the observation (always in the format that is separated by comma like so: 1/20,1/23,1/30) I need to replicate the row for unique dates separated by a comma. At the end of it all the above snippet should be transformed into the following.

Type  Network  Show  Placement  Cost     Dates (chr)
VVVV  AJSS     XGAF  BHGHF      103.00   3/21
VVVV  AJSS     XGAF  BHGHF      103.00   3/23

There are a number of solutions I already have to generate a count of "," which I could then use to replicate row count, but having trouble getting unique dates into the date column. I would like to stick with base R if possible, but am open to any and all packages if it is whats needed to accomplish the goal.

FOR CLARITY: For the duplicate flag. I know how to replicate rows, but how do I apply unique dates that are split by a comma to each newly created row?

Thank you in advance and please call out any additional info that is necessary and I will provide.

YungBoy
  • 235
  • 4
  • 14
  • ?rep function, e.g `rep(c(1,2), times =3)` – Bulat May 10 '16 at 00:06
  • Possible duplicate of [Repeat rows of a data.frame](http://stackoverflow.com/questions/11121385/repeat-rows-of-a-data-frame) – Bulat May 10 '16 at 00:07
  • 1
    Thanks for the speedy response Bulat. This is quite clear to me, and as I have stated I am squared away with how to replicate a row or data frame given the count of occurrences of a character in a given observation or what have you. I should have been clearer. I am just trying to keep unique date split by comma, while replicating. Is there an implementation of rep that can do this. so far seems to me like this isnt relevant to my case because it doesnt deal with splitting dates by comma and applying unique dates to the newly created rows. correct me if I am wrong... – YungBoy May 10 '16 at 00:22

4 Answers4

1

unnest() is probably what you are looking for.

library(dplyr); library(tidyr)
df %>% transform(Dates = strsplit(as.character(Dates), ",")) %>% unnest(Dates)
Source: local data frame [2 x 6]

    Type Network   Show Placement  Cost Dates
  (fctr)  (fctr) (fctr)    (fctr) (dbl) (chr)
1   VVVV    AJSS   XGAF     BHGHF   103  3/21
2   VVVV    AJSS   XGAF     BHGHF   103  3/23
Psidom
  • 209,562
  • 33
  • 339
  • 356
1

Here's a base R solution:

## generate data
set.seed(1L); N <- 4L; df <- data.frame(Type=replicate(N,paste(collapse='',rep(sample(LETTERS,1L),4L))),Network=replicate(N,paste(collapse='',sample(LETTERS,4L))),Placement=replicate(N,paste(collapse='',sample(LETTERS,5L))),Cost=round(runif(N,50,150),2L),`Dates (chr)`=replicate(N,paste(collapse=',',gsub('\\b0','',format(format='%m/%d',sample(seq(as.Date('2016-01-01'),as.Date('2016-12-31'),1L),sample(1:4,1L)))))),stringsAsFactors=F,check.names=F);
df;
##   Type Network Placement   Cost    Dates (chr)
## 1 GGGG    FWYP     YFPCZ 132.09 10/15,1/9,6/22
## 2 JJJJ    QBEX     KAJUH 114.71 9/10,6/23,11/9
## 3 OOOO    RJSL     MOLES 128.29      3/30,1/26
## 4 XXXX    SYJR     RTCQJ 105.30           4/25

## solution #1
ds <- strsplit(df$`Dates (chr)`,',');
data.frame(c(lapply(df[!names(df)%in%'Dates (chr)'],function(col) rep(col,sapply(ds,length))),list(`Dates (chr)`=unlist(ds))),check.names=F,stringsAsFactors=F);
##   Type Network Placement   Cost Dates (chr)
## 1 GGGG    FWYP     YFPCZ 132.09       10/15
## 2 GGGG    FWYP     YFPCZ 132.09         1/9
## 3 GGGG    FWYP     YFPCZ 132.09        6/22
## 4 JJJJ    QBEX     KAJUH 114.71        9/10
## 5 JJJJ    QBEX     KAJUH 114.71        6/23
## 6 JJJJ    QBEX     KAJUH 114.71        11/9
## 7 OOOO    RJSL     MOLES 128.29        3/30
## 8 OOOO    RJSL     MOLES 128.29        1/26
## 9 XXXX    SYJR     RTCQJ 105.30        4/25

Another possibility:

## solution #2
ds <- strsplit(df$`Dates (chr)`,',');
df2 <- df[rep(seq_len(nrow(df)),sapply(ds,length)),];
df2$`Dates (chr)` <- unlist(ds);
df2;
##     Type Network Placement   Cost Dates (chr)
## 1   GGGG    FWYP     YFPCZ 132.09       10/15
## 1.1 GGGG    FWYP     YFPCZ 132.09         1/9
## 1.2 GGGG    FWYP     YFPCZ 132.09        6/22
## 2   JJJJ    QBEX     KAJUH 114.71        9/10
## 2.1 JJJJ    QBEX     KAJUH 114.71        6/23
## 2.2 JJJJ    QBEX     KAJUH 114.71        11/9
## 3   OOOO    RJSL     MOLES 128.29        3/30
## 3.1 OOOO    RJSL     MOLES 128.29        1/26
## 4   XXXX    SYJR     RTCQJ 105.30        4/25
bgoldst
  • 34,190
  • 6
  • 38
  • 64
0

Here is a data.table solution . used the data.frame of bgoldst's answer

library(data.table);
set.seed(1L); N <- 4L; df <- data.frame(Type=replicate(N,paste(collapse='',rep(sample(LETTERS,1L),4L))),Network=replicate(N,paste(collapse='',sample(LETTERS,4L))),Placement=replicate(N,paste(collapse='',sample(LETTERS,5L))),Cost=round(runif(N,50,150),2L),`Dates (chr)`=replicate(N,paste(collapse=',',gsub('\\b0','',format(format='%m/%d',sample(seq(as.Date('2016-01-01'),as.Date('2016-12-31'),1L),sample(1:4,1L)))))),stringsAsFactors=F,check.names=F);


dd  <- as.data.table(df)
merge(dd[,.(Type,Network,Placement,Cost)],dd[,(strsplit(`Dates (chr)`,split = "[,]")),Type],by = "Type");


   Type Network Placement   Cost    V1
1: GGGG    FWYP     YFPCZ 132.09 10/15
2: GGGG    FWYP     YFPCZ 132.09   1/9
3: GGGG    FWYP     YFPCZ 132.09  6/22
4: JJJJ    QBEX     KAJUH 114.71  9/10
5: JJJJ    QBEX     KAJUH 114.71  6/23
6: JJJJ    QBEX     KAJUH 114.71  11/9
7: OOOO    RJSL     MOLES 128.29  3/30
8: OOOO    RJSL     MOLES 128.29  1/26
9: XXXX    SYJR     RTCQJ 105.30  4/25
Bg1850
  • 3,032
  • 2
  • 16
  • 30
0

Another solution using data.table that also keeps in mind the update:

test <- function(x){
 return (unique(unlist(strsplit(x[['Dates (chr)']],','))))
}

library(data.table);
## creating sample dataset
set.seed(1L); N <- 4L; df <- data.frame(Type=replicate(N,paste(collapse='',rep(sample(LETTERS,1L),4L))),Network=replicate(N,paste(collapse='',sample(LETTERS,4L))),Placement=replicate(N,paste(collapse='',sample(LETTERS,5L))),Cost=round(runif(N,50,150),2L),`Dates (chr)`=replicate(N,paste(collapse=',',gsub('\\b0','',format(format='%m/%d',sample(seq(as.Date('2016-01-01'),as.Date('2016-12-31'),1L),sample(1:4,1L)))))),stringsAsFactors=F,check.names=F);

setDT(df)

df[,test(.SD),by=c('Type','Network','Placement','Cost')]
##   Type Network Placement   Cost    V1
##1: GGGG    FWYP     YFPCZ 132.09 10/15
##2: GGGG    FWYP     YFPCZ 132.09   1/9
##3: GGGG    FWYP     YFPCZ 132.09  6/22
##4: JJJJ    QBEX     KAJUH 114.71  9/10
##5: JJJJ    QBEX     KAJUH 114.71  6/23
##6: JJJJ    QBEX     KAJUH 114.71  11/9
##7: OOOO    RJSL     MOLES 128.29  3/30
##8: OOOO    RJSL     MOLES 128.29  1/26
##9: XXXX    SYJR     RTCQJ 105.30  4/25
Kunal Puri
  • 3,419
  • 1
  • 10
  • 22