Group data by factor level, then transform to data frame with colname being levels?

Question

There is my problem that I can't solve it:

Data:

df <- data.frame(f1=c("a", "a", "b", "b", "c", "c", "c"), 
                 v1=c(10, 11, 4, 5, 0, 1, 2))

data.frame:f1 is factor
  f1 v1
  a   10
  a   11
  b   4
  b   5
  c   0
  c   1   
  c   2
 # What I want is:(for example, fetch data with the number of element of some level == 2, then to data.frame)
  a   b
 10   4
 11   5

Thanks in advance!

This is called **reshape from long- to wide-form**, there are many duplicates: [How to reshape data from long to wide format?](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) — smci, Dec 14 '18 at 03:24
@smci - it also adds the complication of wanting to select certain `n` length groups. It's a variation, but I don't think it's a duplicate. — thelatemail, Dec 14 '18 at 03:31
@thelatemail: Sorry I don't understand you. Please edit the question to make it clear why this is not a duplicate. Does *"fetch data with the number of element of some level == 2"* mean include all data from from groups with length 2? not randomly sample data from groups with length 2. — smci, Dec 14 '18 at 06:15

score 2 · Answer 1 · answered Dec 14 '18 at 02:52

I might be missing something simple here , but the below approach using dplyr works.

library(dplyr)
nlevels = 2

df1 <- df %>%
        add_count(f1) %>%
        filter(n == nlevels) %>%
        select(-n) %>%
        mutate(rn = row_number()) %>%
        spread(f1, v1) %>%
        select(-rn)

This gives

#      a     b
#   <int> <int>
#1    10    NA
#2    11    NA
#3    NA     4
#4    NA     5

Now, if you want to remove NA's we can do

do.call("cbind.data.frame", lapply(df1, function(x) x[!is.na(x)]))

#   a b
#1 10 4
#2 11 5

As we have filtered the dataframe which has only nlevels observations, we would have same number of rows for each column in the final dataframe.

thelatemail · Answer 2 · 2018-12-14T03:21:33.587

split might be useful here to split df$v1 into parts corresponding to df$f1. Since you are always extracting equal length chunks, it can then simply be combined back to a data.frame:

spl <- split(df$v1, df$f1)
data.frame(spl[lengths(spl)==2])

#   a b
#1 10 4
#2 11 5

Or do it all in one call by combining this with Filter:

data.frame(Filter(function(x) length(x)==2, split(df$v1, df$f1)))
#   a b
#1 10 4
#2 11 5

moodymudskipper · Answer 3 · 2018-12-14T10:51:25.507

Here is a solution using unstack :

unstack(
  droplevels(df[ave(df$v1, df$f1, FUN = function(x) length(x) == 2)==1,]),
  v1 ~ f1)
#    a b
# 1 10 4
# 2 11 5

A variant, similar to @thelatemail's solution :

data.frame(Filter(function(x) length(x) == 2, unstack(df,v1 ~ f1)))

My tidyverse solution would be:

library(tidyverse)
df                  %>%
  group_by(f1)      %>%
  filter(n() == 2)  %>%
  mutate(i = row_number()) %>%
  spread(f1, v1)   %>%
  select(-i)
# # A tibble: 2 x 2
#       a     b
# * <dbl> <dbl>
# 1    10     4
# 2    11     5

or mixing approaches :

as_tibble(keep(unstack(df,v1 ~ f1), ~length(.x) == 2))

score 0 · Answer 4 · edited Dec 14 '18 at 08:15

I'd like code this, may it helps for you

library(reshape2)

library(dplyr)

aa = data.frame(v1=c('a','a','b','b','c','c','c'),f1=c(10,11,4,5,0,1,2))

cc = aa %>% group_by(v1) %>% summarise(id = length((v1))) 

dd= merge(aa,cc) #get the level 

ee = dd[dd$aa==2,] #select number of level equal to 2

ee$id = rep(c(1,2),nrow(ee)/2) # reset index like (1,2,1,2)

dcast(ee, id~v1,value.var = 'f1')

all done!

score 0 · Answer 5 · answered Dec 14 '18 at 03:01

Using all base functions (but you should use tidyverse)

# Add count of instances
x$len <- ave(x$v1, x$f1, FUN = length)

# Filter, drop the count
x <- x[x$len==2, c('f1','v1')]

# Hacky pivot
result <- data.frame(
lapply(unique(x$f1), FUN = function(y) x$v1[x$f1==y])
)
colnames(result) <- unique(x$f1)

> result
   a b
1 10 4
2 11 5

Group data by factor level, then transform to data frame with colname being levels?

5 Answers5