Problems to subset and split data frame in R (made using cbind.fill and rbind.fill)

Question

##### Code to generate the sample DF

cbind.fill <- function(...){
                            nm <- list(...) 
                            nm <- lapply(nm, as.matrix)
                            n <- max(sapply(nm, nrow)) 
                            do.call(cbind, lapply(nm, function (x) 
                            rbind(x, matrix(, n-nrow(x), ncol(x))))) 
                        }

a <- data.frame(c("Pen","Pen","Pen","Ryu","Ryu","Ken"))
b <- data.frame(c("banana", "apple", 23, "Carrot", "grape"))
c <- data.frame(c("ryu",45,"ynwa"))
final <- data.frame(cbind.fill(a,b,c))
colnames(final) <- c("A","B","C")

    A      B    C           #This is my sample data set
1 Pen banana  ryu
2 Pen  apple   45
3 Pen     23 ynwa
4 Ryu Carrot <NA>
5 Ryu  grape <NA>
6 Ken   <NA> <NA>

################## Expected output

Output Req: I need to split the above output to 3 data frames like below:

    A      B    C           #This is my 1st data frame
1 Pen banana  ryu
2 Pen  apple   45
3 Pen     23 ynwa

    A      B    C           #This is my 2nd data frame

4 Ryu Carrot <NA>
5 Ryu  grape <NA>

    A      B    C           #This is my 3rd data frame
6 Ken   <NA> <NA>

#######I have tried this till now

> final[final=="Pen",]

        #when I subset "Pen", Now i have to remove the NA

        A      B    C
1     Pen banana  ryu
2     Pen  apple   45
3     Pen     23 ynwa
NA   <NA>   <NA> <NA>
NA.1 <NA>   <NA> <NA>
NA.2 <NA>   <NA> <NA>
NA.3 <NA>   <NA> <NA>

> final_pen <- final[complete.cases(final=="Pen"),]

    #I use complete.cases to remove NA, and this looks exactly how i want, I move onto RYU

    A      B    C
1 Pen banana  ryu
2 Pen  apple   45
3 Pen     23 ynwa



> final_ryu <- final[final=="Ryu",] 

    #I subset Ryu

        A      B    C
4     Ryu Carrot <NA>
5     Ryu  grape <NA>
NA   <NA>   <NA> <NA>
NA.1 <NA>   <NA> <NA>
NA.2 <NA>   <NA> <NA>
NA.3 <NA>   <NA> <NA>

Now when I do a complete cases here, the whole data frame vanishes because every row and column over here has a NA. The out put I expect is as below :

        A      B
4     Ryu Carrot 
5     Ryu  grape

I dont want to hardcode and subset since I would be doing this on a lot of data and splitting the big data frame using loopes into multiple data frames. Please help. This is my seond post, and im still learning to get the hang of it here. So please dont downvote if you think this is a stupid question.

[How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) — zx8754, Feb 23 '16 at 08:52
Thanks @zx8754 , ill work on giving a example. This is my first question here, so i have no idea how to format tables. Thanks for the edit though. — Pranavanshu V, Feb 23 '16 at 08:56
@akrun its linked to Existing data frame in the second line. I do not have any upvotes so unable to share screenshots. — Pranavanshu V, Feb 23 '16 at 09:01
@MaxPD Can somebody tell me how to format tables like done above, when i post tables, it doesnt display properly ! — Pranavanshu V, Feb 23 '16 at 10:03

myloginid · Accepted Answer · 2016-02-23T09:42:34.717

From the example it seems that you want to split the dataframes into multiple and then from the child frames remove the columns that are all nulls. Try something like this.

You will have to use lists to maintain the new data frames that get created.

# Sample Data Frame
> df = data.frame( Column1 = paste0('a',c(rep(1,5),rep(2,5),rep(3,5))), Column2 = c(rep(1:2, 5), rep(NA,5)), Column3 = c(rep(NA,5),rep(1:2,5))  )
> df
   Column1 Column2 Column3
1       a1       1      NA
2       a1       2      NA
3       a1       1      NA
4       a1       2      NA
5       a1       1      NA
6       a2       2       1
7       a2       1       2
8       a2       2       1
9       a2       1       2
10      a2       2       1
11      a3      NA       2
12      a3      NA       1
13      a3      NA       2
14      a3      NA       1
15      a3      NA       2

#First. Lets Split on 1st Column.
> dflist = list()
> uniquevals = unique(df$Column1) 
> for (i in 1:length(uniquevals)) {
+     dflist[[i]] = df[df$Column1 == uniquevals[i],]
+ }
> dflist
[[1]]
  Column1 Column2 Column3
1      a1       1      NA
2      a1       2      NA
3      a1       1      NA
4      a1       2      NA
5      a1       1      NA

[[2]]
   Column1 Column2 Column3
6       a2       2       1
7       a2       1       2
8       a2       2       1
9       a2       1       2
10      a2       2       1

[[3]]
   Column1 Column2 Column3
11      a3      NA       2
12      a3      NA       1
13      a3      NA       2
14      a3      NA       1
15      a3      NA       2

#Next - Let's remove all columns where all values are NA
> newlist = lapply(X = dflist, FUN = function(df) { return(  df[,apply(X = df, MARGIN = 2, FUN = function(x) { !all(is.na(x)) } )] ) }  )
> newlist
[[1]]
  Column1 Column2
1      a1       1
2      a1       2
3      a1       1
4      a1       2
5      a1       1

[[2]]
   Column1 Column2 Column3
6       a2       2       1
7       a2       1       2
8       a2       2       1
9       a2       1       2
10      a2       2       1

[[3]]
   Column1 Column3
11      a3       2
12      a3       1
13      a3       2
14      a3       1
15      a3       2

Done!!

THANK YOU SO MUCH ! For understanding my ill formed question and taking the effort ! kudos ! That's exactly what i wanted to do ! — Pranavanshu V, Feb 23 '16 at 09:59
Its taking the first column names in the list as factors, can you tell me where to add StringsAsFactors =FALSE ? — Pranavanshu V, Feb 23 '16 at 11:09

Problems to subset and split data frame in R (made using cbind.fill and rbind.fill)

1 Answers1