1

I have the following sample data:

"","Class","Sex","Age","Survived","Freq"
"1","1st","Male","Child","No",0
"2","2nd","Male","Child","No",0
"3","3rd","Male","Child","No",2
"4","Crew","Male","Child","No",0

I have stored it in a list in R using the following:

dat = read.csv("File.csv", header = TRUE)

Now I would like to copy this list to another which does not have the "Freq" column but has more rows based on the value of the "Freq" (please refer to reqd. data below)(Freq = 0 has no effect) :

"","Class","Sex","Age","Survived"
"1","1st","Male","Child","No"
"2","2nd","Male","Child","No"
"3","3rd","Male","Child","No"
"3","3rd","Male","Child","No"
"4","Crew","Male","Child","No"

The 3rd row in the original data was doubled in the new data due to its Freq = 2. However, the rows with Freq = 0 still had 1 row in the output data. Any help would be much appreciated.

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
Siddharth Sharma
  • 182
  • 1
  • 1
  • 10
  • 1
    this could help http://stackoverflow.com/questions/2894775/replicate-each-row-of-data-frame-and-specify-the-number-of-replications-for-each – abhiieor Sep 04 '16 at 11:50

1 Answers1

2

We can use rep to replicate the sequence of rows of the dataset with the 'Freq' column. As there are 0 values in 'Freq', we need to replace that with 1 and use that vector as argument in rep, expand the rows of 'dat' based on the output of rep as well as select the columns that are not 'Freq' using setdiff.

dat[rep(1:nrow(dat), replace(dat$Freq, dat$Freq==0, 1)), setdiff(names(dat), "Freq")]
#    Class  Sex   Age Survived
#1     1st Male Child       No
#2     2nd Male Child       No
#3     3rd Male Child       No
#3.1   3rd Male Child       No
#4    Crew Male Child       No

data

dat <- structure(list(Class = c("1st", "2nd", "3rd", "Crew"), Sex = c("Male", 
"Male", "Male", "Male"), Age = c("Child", "Child", "Child", "Child"
), Survived = c("No", "No", "No", "No"), Freq = c(0L, 0L, 2L, 
0L)), .Names = c("Class", "Sex", "Age", "Survived", "Freq"), class =
"data.frame", row.names = c(NA, -4L))
akrun
  • 874,273
  • 37
  • 540
  • 662