Compare matrices to find the differences

Question

I have 2 matrices, I want to compare them (row.name wise) to find the difference.

> head(N1)
              Total_Degree Transitivity Betweenness Closeness_All
2410016O06RIK            1          NaN     0.00000  0.0003124024
AGO1                     4    0.1666667    37.00000  0.0003133814
APEX1                    4    0.6666667     4.00000  0.0003144654
ATR                      4    0.1666667    19.50000  0.0003128911
CASP3                   24    0.0000000   806.00000  0.0002980626
CCND2                    4    0.3333333    97.33333  0.0003132832

head(N2)
              Total_Degree Transitivity Betweenness Closeness_All
2410016O06RIK            1          NaN         0.0  2.279982e-04
ADI1                     1          NaN         0.0  1.728877e-05
AGO1                     3    0.0000000        40.0  2.284670e-04
AIRN                     1          NaN         0.0  1.721733e-05
APEX1                    3    0.6666667         2.0  2.288330e-04
ATR                      3    0.3333333        19.5  2.281542e-04

Many of the rows.name in N1 do exist in N2, I want to compare them and write the difference in a new matrix. Those which are unique to N1 or N2 should be mentioned that they either belong to N1 or N2.

I am not sure which is the best criteria to calculate the difference, what I can think of, is a simple addition of all values of a row in N1 and subtract that value from additive value of corresponding row in N2.

For example output should be:

> head(Compared)
                       Comparison Unique 
    2410016O06RIK        0.0002     Common
    AGO1                 -1.83      Common
    APEX1                 2.24      Common
    ATR                  0.0034     Common
    CASP3               830.00029   N1
    ADI1                1.0007288   N2

Here for row.name = 2410016O06RIK, all values from N1 and N2 were added and then N1-N2 was written in Comparison column, as this row was common in both matrices so common was written in Unique column.

Could you provide code to reproduce the data? For uniqueness I would go with inner and left/right joins by row.name, then simply rbind three resulting data.frames, and for the difference maybe `all.equal()` would do the job? You need to specify what you mean by a "difference". — m-dz, Apr 05 '16 at 08:40
@M.D I'm reading the text files here, I don't know how I can provide you the data. From difference I mean "let's consider that row.names are different person and we have their income from different sources in different columns. We calculate their overall income (adding all values of a row) in 2 time points (N1 and N2) and we want to find out the person whose income has drastically changed across these time points (by subtracting N1 - N2)". — user3253470, Apr 05 '16 at 08:47
That is a draft of solution using data.table package, if you are not familiar with it I will try to rewrite it without it: (here was the code, but I will post it as a solution for readability). — m-dz, Apr 05 '16 at 09:02

Cath · Accepted Answer · 2016-04-05T09:27:52.143

4

A way to go in base R, with rowSums and merge:

If N1 and N2 are data.frames:

# compute the row sums and merge N1 and N2
N1$rs <- rowSums(N1, na.rm=TRUE)
N2$rs <- rowSums(N2, na.rm=TRUE)
comp <- merge(N1[, "rs", drop=FALSE], N2[, "rs", drop=FALSE], by="row.names", all=TRUE)

# then compare the row sums and the variable "locations"
comp$Unique <- with(comp, c("N1", "N2", "common")[(!is.na(rs.x)) + 2*(!is.na(rs.y))])
comp$Comparison <- with(comp, rs.x-rs.y)

# keep only the variable you need:
comp <- comp[, c(1, 5, 4)]

If N1 and N2 are matrices:

# compute the row sums and merge N1 and N2
rs1 <- rowSums(N1, na.rm=TRUE)
rs2 <- rowSums(N2, na.rm=TRUE)
comp <- merge(N1, N2, by="row.names", all=TRUE)

# then compare the row sums and the variable "locations"
comp$Unique <- with(comp, c("N1", "N2", "common")[as.numeric(!is.na(Total_Degree.x)) + 2*as.numeric(!is.na(Total_Degree.y))])
comp$Comparison <- with(merge(as.data.frame(rs1), as.data.frame(rs2), all=TRUE, by="row.names"), rs1-rs2)

# keep only the variable you need:
comp <- comp[, c("Row.names", "Comparison", "Unique")]

output of both methods:

comp
#      Row.names    Comparison Unique
#1 2410016O06RIK  0.0000844042 common
#2          ADI1            NA     N2
#3          AGO1 -1.8332483856 common
#4          AIRN            NA     N2
#5         APEX1  3.0000856324 common
#6           ATR  0.8334181369 common
#7         CASP3            NA     N1
#8         CCND2            NA     N1

edited Apr 05 '16 at 09:27

answered Apr 05 '16 at 08:56

Cath

23,906
5
52
86

`comp <- merge(N1[, "rs", drop=FALSE], N2[, "rs", drop=FALSE], by="row.names", all=TRUE)` This command results in: **Error in N1[, "rs", drop = FALSE] : incorrect number of dimensions**. Can you tell me how I can solve it. Thanks for the answer. – user3253470 Apr 05 '16 at 09:03
Thanks for the answer. Do you think this is the right way to compare the matrices/data.frames to get the differences or there are other methods out there? Secondly, can you guide me how can I provide the data for future questions? Thanks indeed. – user3253470 Apr 05 '16 at 09:17
@user3253470, easiest question : you can either make a small reproducible example of dummy data or use `dput` on a part of your data to give us the structure. For your other question, I guess it depends on what kind of information you are after. Does the difference of rowSums make sense ? do you need a variable-wise difference ? something else ? I'd say it is up to what you need actually... – Cath Apr 05 '16 at 09:22
Regarding the comparison, your columns seem to be from completely different ranges/distributions, so you definitely cannot compare them as you want to. Maybe some normalisation and then sums/differences? – m-dz Apr 05 '16 at 09:23
@Cath Can you help me with UPDATE section of this question: [http://stackoverflow.com/questions/35484595/data-frame-merge-and-selection-of-values-which-are-common-in-2-data-frames] – user3253470 Apr 07 '16 at 08:19
@user3253470 I've commented under PierreLafortune's post, sorry I didn't realise it wasn't the OP that pinged me – Cath Apr 07 '16 at 14:50

m-dz · Answer 2 · 2016-04-05T09:17:52.890

2

That is a part of the solution, in res you have a data.table to work with for the difference part:

require(data.table)
require(dplyr)

set.seed(2016)
dt1 <- data.table(V1 = c("a", "b", "c", "d"), V2 = rnorm(4))
dt2 <- data.table(V1 = c("c", "d", "e", "f"), V2 = rnorm(4))

# common <- merge(dt1, dt2, by = "V1")[, Unique := "Common"]
# unique1 <- dt1[V1 %nin% dt2[, V1], ][, Unique := "N1"]
# unique2 <- dt2[V1 %nin% dt1[, V1], ][, Unique := "N2"]
# res <- rbind(common, unique1, unique2, fill = TRUE)

Small update after @Cath answer, just for clarity.

allMerged <- merge(dt1, dt2, by = "V1", all = TRUE) %>%
  .[, RowSum := rowSums(.SD, na.rm = TRUE), .SDcols = grep("V2", names(.))] %>%
  .[, Unique := ((is.na(V2.x) + 2*is.na(V2.y)))]

print(allMerged)

edited Apr 05 '16 at 09:17

answered Apr 05 '16 at 09:04

m-dz

2,342
17
29

1

ah-ah, I didn't know why (oh why) I had to put `as.numeric` on my `is.na` test when I really didn't want too, but it was just a matter of being careful the first `!` didn't negate the whole "sentence" :-) – Cath Apr 05 '16 at 09:29
1

I would be carefull there, as you need to wrap every `(!is.na(V2.x))` in braces, if not you will end with logical vector of TRUE/FALSEs. Without `!` it is not the case, which is easier I think. – m-dz Apr 05 '16 at 09:42
though without `!`, you don't really get the "correct" information (putting `2*is.na(V2.x) + is.na(V2.y)` would get you half there but you'll still have `0` for common values...) – Cath Apr 05 '16 at 11:13
@Cath, and this 0 is distinct only for the common values, which looks okay for me - you can then recode this hovewer you want. – m-dz Apr 05 '16 at 14:14
@M.D Can you help me with UPDATE section of this question: [http://stackoverflow.com/questions/35484595/data-frame-merge-and-selection-of-values-which-are-common-in-2-data-frames] – user3253470 Apr 07 '16 at 08:46
@user3253470, why not ask in the original question? See the `stats::aggregate`, `dplyr::summarise` or the `data.table` `by` syntax. – m-dz Apr 07 '16 at 09:55

Compare matrices to find the differences

2 Answers2

Linked