0

I have the following dummy dataset of 1000 observations:

obs <- 1000

df <- data.frame(
  a=c(1,0,0,0,0,1,0,0,0,0),
  b=c(0,1,0,0,0,0,1,0,0,0),
  c=c(0,0,1,0,0,0,0,1,0,0),
  d=c(0,0,0,1,0,0,0,0,1,0),
  e=c(0,0,0,0,1,0,0,0,0,1),
  f=c(10,2,4,5,2,2,1,2,1,4),
  g=sample(c("yes", "no"), obs, replace = TRUE),
  h=sample(letters[1:15], obs, replace = TRUE),
  i=sample(c("VF","FD", "VD"), obs, replace = TRUE),
  j=sample(1:10, obs, replace = TRUE)
)

One key feature of this dataset is that the variables a to e's values are only one 1 and the rest are 0. We are sure the only one of these five columns have a 1 as value.

I found a way to extract these rows given a condition (with a 1) and assign to their respective variables:

df.a <- df[df[,"a"] == 1,,drop=FALSE]
df.b <- df[df[,"b"] == 1,,drop=FALSE]
df.c <- df[df[,"c"] == 1,,drop=FALSE]
df.d <- df[df[,"d"] == 1,,drop=FALSE]
df.e <- df[df[,"e"] == 1,,drop=FALSE]

My dilemma now is to limit the rows saved into df.a to df.e and to merge them afterwards.

Community
  • 1
  • 1

2 Answers2

0
  1. To get the n-rows subset, a simple data[1:n,] does the job.

    df.a.sub <- df.a[1:10,]
    df.b.sub <- df.b[1:10,]
    df.c.sub <- df.c[1:10,]
    df.d.sub <- df.d[1:10,]
    df.e.sub <- df.e[1:10,]
    
  2. Finally, merge them by (it took the most time to find a straightforward "merge multiple dataframes" and all I needed to do was rbind.fill(df1, df2, ..., dfn) thanks to this question and answer):

    require(plyr)
    df.merged <- rbind.fill(df.a.sub, df.b.sub, df.c.sub, df.d.sub, df.e.sub)
    
Community
  • 1
  • 1
0

Here's a shorter way to create df.merged:

# variables of 'df'
vars <- c("a", "b", "c", "d", "e")

# number of rows to extract
n <- 100

df.merged <- do.call(rbind, lapply(vars, function(x) {
  head(df[as.logical(df[[x]]), ], n)
}))

Here, rbind is sufficient. The function rbind.fill is necessary if your data frames differ with respect to the number of columns.

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168