I have multiple .csv files with a large matrix, ~300 rows and 2000 cols. I want to develop a new matrix table for each row by selecting the entire columns that have an equal value that is 1 . I would like to keep the row and column names and would like to create file with row names in a directory.
This the example of the datasets:
pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
WHIRR-25 1 0 1 0 1 1 1
WHIRR-28 1 0 1 0 0 1 0
WHIRR-55 0 0 1 0 0 0 0
WHIRR-61 0 0 0 0 0 1 0
WHIRR-76 0 0 1 0 0 0 0
WHIRR-87 1 1 1 0 0 1 1
WHIRR-92 1 0 0 1 0 1 1
So this data sets will develop an output like as below:
Whirr-25
pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
WHIRR-25 1 0 1 0 1 1 1
WHIRR-28 1 0 1 0 0 1 0
WHIRR-55 0 0 1 0 0 0 0
WHIRR-61 0 0 0 0 0 1 0
WHIRR-76 0 0 1 0 0 0 0
WHIRR-87 1 1 1 0 0 1 1
WHIRR-92 1 0 0 1 0 1 1
Whirr-28
pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
WHIRR-28 1 0 1 0 0 1 0
WHIRR-55 0 0 1 0 0 0 0
WHIRR-61 0 0 0 0 0 1 0
WHIRR-76 0 0 1 0 0 0 0
WHIRR-87 1 1 1 0 0 1 1
WHIRR-92 1 0 0 1 0 1 1
Whirr-55
pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
WHIRR-55 0 0 1 0 0 0 0
WHIRR-76 0 0 1 0 0 0 0
WHIRR-87 1 1 1 0 0 1 1
Whirr-61
pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
WHIRR-61 0 0 0 0 0 1 0
WHIRR-87 1 1 1 0 0 1 1
WHIRR-92 1 0 0 1 0 1 1
Whirr-76
pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
WHIRR-76 0 0 1 0 0 0 0
WHIRR-87 1 1 1 0 0 1 1
Whirr-87
pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
WHIRR-87 1 1 1 0 0 1 1
WHIRR-92 1 0 0 1 0 1 1
Whirr-92
pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
WHIRR-92 1 0 0 1 0 1 1
I applied this script, but the script only create new table based on columns, not rows:
dat <- read.table(file="Task_vs_Files_Proj.csv", header=T, sep=",", row.names=1)
dat
apply( sapply(dat , function(x) return( as.logical(x) ) ), 2, function(x) dat[x, ])
$pom.xml.
pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
WHIRR-25 1 0 1 0 1 1 1
WHIRR-28 1 0 1 0 0 1 0
WHIRR-87 1 1 1 0 0 1 1
WHIRR-92 1 0 0 1 0 1 1
$ZooKeeper.java
pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
WHIRR-87 1 1 1 0 0 1 1
$HBase.java
pom.xml. ZooKeeper.java HBase.java Hadoop.java. BasicServer.java. Abstract.java. HBaseRegion.java
WHIRR-25 1 0 1 0 1 1 1
WHIRR-28 1 0 1 0 0 1 0
WHIRR-55 0 0 1 0 0 0 0
WHIRR-76 0 0 1 0 0 0 0
WHIRR-87 1 1 1 0 0 1 1
Appreciate help from the expert here...Thank you