0

I have a data set like this:

    > df<-data.frame(gender=c(rep("male",3),rep("female",3)),
    Age=c(rep("old",3),rep("young",3)),VAR=c(rep(1:3),rep(1:3)),
    FEN1=c(21,26,29,30,6,11),FEN2=c(14,55,12,33,9,21),
    FEN3=c(88,23,55,23,14,66))

Where FEN1, FEN2 and FEN3 contain the total number of individuals belonging to that group and which have the characteristics of the columns VAR, Gender, Age, FEN.

And I need to change it to a data frame where each row belongs to one person (536 rows in total) with the characteristics of the columns VAR, Gender, Age.

The expected output would contain:

  • 21 rows with information: male, old, 1, FEN1
  • 14 rows with information: male, old, 1, FEN2
  • 88 rows with information: male, old, 1, FEN3
  • 26 rows with information: male, old, 2, FEN1
  • 55 rows with information: male, old, 2, FEN2
  • 23 rows with information: male, old, 2, FEN3
  • and so on...

I was trying to do this by hand with a code like:

    > df2<-as.data.frame(1:536)
    > FEN <- c(rep("FEN1",123), rep("FEN2",144), rep("FEN3",269))
    > df2$FEN<-FEN
    > Gender<-c(rep("male",...)...

But obviously it is not at all efficient.

  • Eventually you can reshape from wide to long http://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format and then use the numbers to replicate the rows. Or for each row you are taking the numbers from FEN1, FEN2, FEN3 and constructing new dataframes for later rbind(). – jogo Mar 24 '17 at 19:20

1 Answers1

2

Here is one method that uses base R methods.

# get the vector names that are used to repeat
fenCats <- tail(names(df), 3)
# construct a list of data.frames where the rows have been repeated
# one data.frame for each of the FEN variables
temp <- Map(function(x) df[rep(seq_len(nrow(df)), x), 1:3], df[fenCats])
# combine list of data.frames and add column with FEN categories
dfNew <- cbind(do.call(rbind, temp),
               "fenCats"=rep(fenCats, colSums(df[fenCats])))

we can verify that the row counts are correct with

nrow(dfNew) == sum(colSums(df[fenCats])) &
nrow(dfNew) == sum(rowSums(df[fenCats]))
[1] TRUE

As an additional verification, we can also perform a quick verification by pulling the first row of each group using subsetting and cumsum:

dfNew[cumsum(unlist(df[,fenCats])),]
          gender   Age VAR fenCats
FEN1.1.20   male   old   1    FEN1
FEN1.2.25   male   old   2    FEN1
FEN1.3.28   male   old   3    FEN1
FEN1.4.29 female young   1    FEN1
FEN1.5.5  female young   2    FEN1
FEN1.6.10 female young   3    FEN1
FEN2.1.13   male   old   1    FEN2
FEN2.2.54   male   old   2    FEN2
FEN2.3.11   male   old   3    FEN2
FEN2.4.32 female young   1    FEN2
FEN2.5.8  female young   2    FEN2
FEN2.6.20 female young   3    FEN2
FEN3.1.87   male   old   1    FEN3
FEN3.2.22   male   old   2    FEN3
FEN3.3.54   male   old   3    FEN3
FEN3.4.22 female young   1    FEN3
FEN3.5.13 female young   2    FEN3
FEN3.6.65 female young   3    FEN3
lmo
  • 37,904
  • 9
  • 56
  • 69