0

I don't want to fill up the matrix with zero or something else. I am wondering how one can deal with these questions

data<- structure(c(79L, 106L, 156L, 194L, 248L, 248L, 248L, 266L, 272L, 
            79L, 106L, 125L, 156L, 156L, 156L, 156L, 156L, 194L, 79L, 156L, 
            156L, 156L, 156L, 156L, 156L, 156L, 156L, 79L, 248L, 393L, 674L, 
            2447L, NA, NA, NA, NA, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 
            21L, NA, NA, NA, NA, NA, NA, NA, NA, NA), .Dim = c(9L, 6L), .Dimnames = list(
              NULL, c("a", "b", "c", "d", 
                      "e", "f"))) 

1- highlight those that more than once repeated in one column

Expected output

      a        b       c    d     e     f
[1,]248(3)  156(3)   156(8)      21(9)
[2,]        248(2)            
[3,]                          
[4,]                          
[5,]                          
[6,]                          
[7,]                          
[8,]                         
[9,]                         

1- highlight those that more than once repeated in row column

Expected output

[1,]  79(4)  
[2,] 106(2) 
[3,] 156(2) 
[4,] 156(2)
[5,] 156(2)
[6,] 156(2)
[7,] 156(2)
[8,] 156(2)
[9,] 

3- how to keep unique element in each column without changing the dimension ?

expected output

     a   b   c    d  e  f
[1,]  79  79  79   79 21 
[2,] 106 106 156  248    
[3,] 156 125      393    
[4,] 194 156      674    
[5,] 248 194     2447    
[6,] 266               
[7,] 272               
[8,]                   
[9,]        

4- how to find which numbers appears in the entire matrix based on row , ranking

expected output

21(9) 156(8) 248(3) 156(3) 248(2) 
  • please one question per post – mtoto Feb 27 '16 at 15:58
  • 1
    That's a very unusual data structure you're looking for. Can you explain what you want to do with those results? – talat Feb 27 '16 at 16:01
  • 3
    Please (1) split this up in seperate question (2) do some googling yourself, (3) show us what you have tried and where you got stuck. Meanwhile this question should be closed as being __to broad__ – RHA Feb 27 '16 at 16:13
  • @RHA at first, I have googled and I could not find a question similar, if you are aware of any , please post here. However, I have tried many things but since the data structure is not common, then it is difficult to solve it – koskesh kiramtodahanet Feb 27 '16 at 16:35
  • @docendodiscimus discimusThis data is part of a bigger data, I agree it is very unusual because many data repeated. I want to know more inside this data, that is why I am trying to do it – koskesh kiramtodahanet Feb 27 '16 at 16:37

2 Answers2

2

With respect to highlighting number occurrences in matrix, wouldn't:

table(data)

be enough? For multiple occurrences you could do:

table(data)[table(data) > 1]

Then if you wish to evaluate your statements for row and/or columns you could do:

lstRes <- list()
for (i in 1:dim(data)[1]) {
    lstRes[[i]] <-table(data[i,])[table(data[i,]) > 1]
}

To arrive at a data.frame:

lstRes <- list()
for (i in 1:dim(data)[1]) {
    lstRes[[i]] <- as.matrix(table(data[i,])[table(data[i,]) > 1])
}

Reduce(rbind, lstRes)
Konrad
  • 17,740
  • 16
  • 106
  • 167
  • thanks , your answer is logical but giving list as output which makes it difficult to read and manipulate further. Also when we apply this on a bigger data, it will be very slow because of For you used – koskesh kiramtodahanet Feb 27 '16 at 16:38
  • @koskeshkiramtodahanet Could be faster or slower, depends on the other solution. You could make it run faster by [allocating storage](http://stackoverflow.com/a/7144801/1655567) if you know dimensions of the new data set. Loops may be [slow or not](http://stackoverflow.com/a/7142982/1655567) but I agree that object growing in a loop can usually be done faster than through loop. What are dimensions of your original matrix? – Konrad Feb 27 '16 at 16:50
  • 1
    @Konrad, in all parts except the first you compute `table(data[i,])` twice which is computationally inefficient. It's better to compute it once, store the result in a variable and then use that for subsetting, i.e. `tt <- table(data); tt[tt > 1]`. And in the `for` loop, as you mentioned in the comment, it's very inefficient to grow that list without preallocation. If you don't know the dimension before computing, try to make it larger than possibly required and afterwards remove the unused elements. – talat Feb 27 '16 at 17:15
  • @docendodiscimus Thanks very much for the useful comments, taken on board with respect to the `table`. – Konrad Feb 27 '16 at 17:17
  • @Konrad can you please write what each function does ? when I use tt[tt > 1] i get an error like Warning message: In Ops.factor(left, right) : ‘>’ not meaningful for factors – koskesh kiramtodahanet Feb 27 '16 at 21:34
  • @koskeshkiramtodahanet *Table* gives you a frequency table. The output is converted to matrix and loop is to subset the data to do the frequency tables by rows, as per original request. Those are standard function if you look up *table* in help you will get a more detailed explanation and reproducible examples. – Konrad Mar 01 '16 at 17:20
2
# this gives you min, median, mean, max of each column 
summary(data)
# this gives you which number are repeated 
data[duplicated(data),]
# gives you how many times each elemnt appears in the data 
as.data.frame(sort(table(data)))
# you can count how many unique values are in each columns and rows, respectively 
apply(data, 2 function(x)length(unique(x)))
apply(data, 1, function(x)length(unique(x)))
# this also give you a logical idea of duplicated elements 
apply(data,2,duplicated)
# if you want to see whether you have any duplicated row (it takes into acount all elements)
duplicated(data)