0

I have multiple .csv files with a large matrix, ~300 rows and 2000 cols. I want to develop a new matrix table for each row by selecting the entire columns that have an equal value that is 1 . I would like to keep the row and column names and would like to create file with row names in a directory.

This the example of the datasets:

       pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
WHIRR-25        1              0          1            0                 1              1                1
WHIRR-28        1              0          1            0                 0              1                0
WHIRR-55        0              0          1            0                 0              0                0
WHIRR-61        0              0          0            0                 0              1                0
WHIRR-76        0              0          1            0                 0              0                0
WHIRR-87        1              1          1            0                 0              1                1
WHIRR-92        1              0          0            1                 0              1                1

So this data sets will develop an output like as below:

    Whirr-25
                   pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
 WHIRR-25        1              0          1            0                 1              1                1
 WHIRR-28        1              0          1            0                 0              1                0
 WHIRR-55        0              0          1            0                 0              0                0
 WHIRR-61        0              0          0            0                 0              1                0
 WHIRR-76        0              0          1            0                 0              0                0
 WHIRR-87        1              1          1            0                 0              1                1
 WHIRR-92        1              0          0            1                 0              1                1

Whirr-28
                pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
 WHIRR-28        1              0          1            0                 0              1                0
 WHIRR-55        0              0          1            0                 0              0                0
 WHIRR-61        0              0          0            0                 0              1                0
 WHIRR-76        0              0          1            0                 0              0                0
 WHIRR-87        1              1          1            0                 0              1                1
 WHIRR-92        1              0          0            1                 0              1                1

Whirr-55

             pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java

WHIRR-55        0              0          1            0                 0              0                0
WHIRR-76        0              0          1            0                 0              0                0
WHIRR-87        1              1          1            0                 0              1                1

Whirr-61
          pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
WHIRR-61        0              0          0            0                 0              1                0
WHIRR-87        1              1          1            0                 0              1                1
WHIRR-92        1              0          0            1                 0              1                1

Whirr-76
               pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
WHIRR-76        0              0          1            0                 0              0                0
WHIRR-87        1              1          1            0                 0              1                1

Whirr-87
              pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
WHIRR-87        1              1          1            0                 0              1                1
WHIRR-92        1              0          0            1                 0              1                1

Whirr-92
              pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
 WHIRR-92        1              0          0            1                 0              1                1

I applied this script, but the script only create new table based on columns, not rows:

 dat <- read.table(file="Task_vs_Files_Proj.csv", header=T, sep=",", row.names=1) 
    dat

    apply( sapply(dat , function(x) return( as.logical(x) ) ), 2, function(x) dat[x, ])

$pom.xml.
        pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
    WHIRR-25        1              0          1            0                 1              1                1
    WHIRR-28        1              0          1            0                 0              1                0
    WHIRR-87        1              1          1            0                 0              1                1
    WHIRR-92        1              0          0            1                 0              1                1

    $ZooKeeper.java
             pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
    WHIRR-87        1              1          1            0                 0              1                1

    $HBase.java
             pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
    WHIRR-25        1              0          1            0                 1              1                1
    WHIRR-28        1              0          1            0                 0              1                0
    WHIRR-55        0              0          1            0                 0              0                0
    WHIRR-76        0              0          1            0                 0              0                0
    WHIRR-87        1              1          1            0                 0              1                1

Appreciate help from the expert here...Thank you

user1676484
  • 197
  • 2
  • 2
  • 11
  • possible duplicate of [Creating new table from a big .csv table](http://stackoverflow.com/questions/12453483/creating-new-table-from-a-big-csv-table) – mnel Sep 20 '12 at 00:34
  • 1
    This looks like a well edited version of a question that you have marked as answered (http://stackoverflow.com/questions/12453483) - Why not edit that question (and not mark it accepted if the answer did not work) – mnel Sep 20 '12 at 00:35
  • @mnel yeap just want to make it more clear – user1676484 Sep 20 '12 at 00:36
  • 1
    I think he changed his mind about what was needed. I for one would like not to have to go back and redo an old answer to a new question. And : If you understand what he is asking for here, _please_ answer it. – IRTFM Sep 20 '12 at 00:38
  • @Dwin -- whilst this new question is clearer, it just looks to be that the OP wants subsets `dat[n:nrow(dat),]` for `n in 1:nrow(dat)` - nothing to do with the columns – mnel Sep 20 '12 at 00:53
  • Please post it. I hope it's what he wants. – IRTFM Sep 20 '12 at 00:55
  • @mnel please. appreciate your help. – user1676484 Sep 20 '12 at 01:17
  • @DWin please, appreciate your help too. I know you are expert. I've been in R for 3 weeks and I need to analyze thousand lines of data in 3-4 weeks. – user1676484 Sep 20 '12 at 01:38

1 Answers1

1

As far as I can see, you want

# cycle through all rows
for(which_row in seq_len(nrow(.data))){
  # get the subset of the rows from this row 
  subset_data <- .data[which_row:nrow(.data),]
  # which elements for each column == 1
  which_one <- lapply(subset_data, function(x){which(as.logical(x))})
  # drop the columns where there are no 1's
  which_one <- Filter(function(x){length(x) >0},which_one)
  # filter to those which == 1, and then get the unique combination
  # of rows (sorted to original order)
  which_rows <- sort(Reduce(union,Filter(function(x) {1 %in% x}, which_one)))
  # the file name
  file_name <- sprintf('file_%s.csv', row.names(.data)[which_row])
  # save
  write.csv(subset_data[which_rows,], file_name, row.names = T)
  # prints the data set to the console for checking
  print(subset_data[which_rows,])
  # message to show what file is created
  message(sprintf('Saving %s', file_name))
}
mnel
  • 113,303
  • 27
  • 265
  • 254
  • thank you. I take an example from Whirr-55. It only save with two other rows (76 and 87) because they are equalling 1 in column HBase.Java Also, as for Whirr-87, it is save with whirr-92 because pom.xml, Abstarct.java and HBaseRegion has are equalling to 1. – user1676484 Sep 20 '12 at 01:43
  • I combine the script with read.table (to read .csv datasets), but it has an error `Error in write.table(subset_data, file_name, header = T, include.rownames = T, : unused argument(s) (header = T, include.rownames = T)` – user1676484 Sep 20 '12 at 01:45
  • WHat about `Whirr-25` or any of the others? It is not at all clear what you want. – mnel Sep 20 '12 at 01:45
  • edited, a basic mistake on my part, no header argument (for reading only) - – mnel Sep 20 '12 at 01:46
  • i take whirr-28 as an example. whirr-28 is group with other rows because pom.xml is equal to 1 with whirr-87 and whirr-92. It also equalling 1 in HBase.java with whirr-55, whirr-76 and whirr-87. Also equalling 1 in Abstract.java with whirr-61, whirr-87 and whirr-92 – user1676484 Sep 20 '12 at 01:50
  • the scripts works. i removed `include.rownames = T` because it is unused argument. But the script still not shows the output as I want – user1676484 Sep 20 '12 at 01:54
  • Works for me. try `file.exists('file_WHIRR-25.csv')` perhaps you are looking in the wrong place for the files – mnel Sep 20 '12 at 02:19
  • i've set the directory and the file I want to load is locataed in that directory. The console end with `+` – user1676484 Sep 20 '12 at 02:26
  • If you copy and run the entire code, it will work. If the console ends with a `+` then it is expecting more input. most likely you have not included the last line which is `}`. Press `esc` to clear and re-run. – mnel Sep 20 '12 at 02:29
  • it works now. I combine new proposed line with your previous code. But, seems it still not filtering appropriately. for instance, new files for whirr-61, should only have whirr-87 and whirr-92. But it appears that it still have whirr-76. – user1676484 Sep 20 '12 at 02:39
  • What *new proposed line?* Copy the code as it stands in the answer and it creates a file for WHIRR-61 with lines WHIRR-87 and WHIRR-92. No WHIRR-76 – mnel Sep 20 '12 at 02:42
  • it works now! thank you so much for your help! really appreciate your kindness. thank you again. – user1676484 Sep 20 '12 at 02:43