Segregating dataset and name each new dataset as per unique column names

Question

I have a dataset(nm) as shown below:

nm

2_V2O   10_Kutti    14_DD   15_TT   16_DD   19_V2O  20_Kutti
  0        1          1       0       0       1        0
  1        1          1       1       1       0        0
  0        1          0       1       0       0        1
  0        1          1       0       1       0        0

Now I want to have multiple new datasets which got segregated as per their unique column names. All dataset names also must be created as per their column names as shown below:

Kutti   
10_Kutti    20_Kutti
   1          0
   1          0
   1          1
   1          0

V2O 
2_V2O   19_V2O
  0       1
  1       0
  0       0
  0       0

DD  
14_DD   16_DD
  1       0
  1       1
  0       0
  1       1

TT  
16_TT   
0   
1   
0   
1

I know this can be done using "select" function in dplyr but I need one dynamic programme which builds this automatically for any dataset.

Possible duplicate of [In R, how to split/subset a data frame by factors in one column?](http://stackoverflow.com/questions/19327020/in-r-how-to-split-subset-a-data-frame-by-factors-in-one-column) — 989, Sep 28 '16 at 11:05
@m0h3n this question is about how to split by columns- not by rows. — David Arenburg, Sep 28 '16 at 12:49
@DavidArenburg Yep, if I were the OP, I could handle my question with that post, however. — 989, Sep 28 '16 at 13:33

akrun · Accepted Answer · 2016-09-28T11:16:20.153

6

We can split by the substring of the column names of 'nm'. Remove the prefix of the columnames until the _ with sub and use that to split the 'nm'.

lst <- split.default(nm, sub(".*_", "", names(nm)))
lst
#$DD
#  14_DD 16_DD
#1     1     0
#2     1     1
#3     0     0
#4     1     1

#$Kutti
#  10_Kutti 20_Kutti
#1        1        0
#2        1        0
#3        1        1
#4        1        0

#$TT
#  15_TT
#1     0
#2     1
#3     1
#4     0

#$V2O
#  2_V2O 19_V2O
#1     0      1
#2     1      0
#3     0      0
#4     0      0

It is better to keep the data.frames in a list. If we insist that it should be individual data.frame objects in the global environment (not recommended), use list2env

list2env(lst, envir = .GlobalEnv)

Now, just call

DD

data

nm <- structure(list(`2_V2O` = c(0L, 1L, 0L, 0L), `10_Kutti` = c(1L, 
1L, 1L, 1L), `14_DD` = c(1L, 1L, 0L, 1L), `15_TT` = c(0L, 1L, 
1L, 0L), `16_DD` = c(0L, 1L, 0L, 1L), `19_V2O` = c(1L, 0L, 0L, 
0L), `20_Kutti` = c(0L, 0L, 1L, 0L)), .Names = c("2_V2O", "10_Kutti", 
"14_DD", "15_TT", "16_DD", "19_V2O", "20_Kutti"), class = "data.frame",
row.names = c(NA, -4L))

edited Sep 28 '16 at 11:16

answered Sep 28 '16 at 10:58

akrun

874,273
37
540
662

2

Interesting.. You are not using `split` (which will call `split.data.frame`) rather you explicitly calling `split.default` which treats the data.frame as a list (?). I don't see this documented anywhere. Where did you get this from? – David Arenburg Sep 28 '16 at 11:03
@DavidArenburg I think even if we do the `split` it still calls the `split.default` based on the warnings `Warning message: In split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : data length is not a multiple of split variable`. Also, with `split.data.frame`, it is actually splitting the rows, but here we wants to split the columns – akrun Sep 28 '16 at 11:07
@akrun But I want different dataframes not list – ROY Sep 28 '16 at 11:14
@akrun, im curious, why would you advise against using `list2env ()` – Gin_Salmon Sep 28 '16 at 11:40
2

@Gin_Salmon One reason is that most of the operations can be done within the `list`. With single data.frame, if you want to do those operations, again we may need to loop it. In addition, writing the files also needs looping. Instead of that, why not keep in the `list`. – akrun Sep 28 '16 at 11:47
1

Because the creator of the [`list2env` himself advises against it](http://stackoverflow.com/questions/25761656/assign-data-frame-name-to-list-elements-using-name-vector/25761850#comment40331852_25761850) – David Arenburg Sep 28 '16 at 12:47

Segregating dataset and name each new dataset as per unique column names

1 Answers1

data