-1

I have a dataframe of which areas certain species occur in, where 1 = present, and 0 = absent. I would like to create a pairwise matrix summing the number of species with shared areas. This is an example of my data:

My data:

structure(c(0.5, 0.3, 0.25, 0.5, 0.3, 0.25, 0, 0.3, 0.25, 0, 0, 0.25), .Dim = 3:4, 
          .Dimnames = list(c("Species1", "Species2", "Species3"), 
                           c("AreaA", "AreaB", "AreaC", "AreaD")))

         AreaA AreaB AreaC AreaD
Species1  0.5   0.5   0     0
Species2  0.3   0.3   0.3   0
Species3  0.25  0.25  0.25  0.25

And I would like something like this in the end:

      AreaA AreaB AreaC AreaD
AreaA     0   2.1   1.1   0.5
AreaB         0     1.1   0.5
AreaC               0     0.5
AreaD                     0

I have a list of over 50,000 species I need to summarise shared areas for.

LizzyJ
  • 9
  • 2
  • 4
    Please do not post code as images but as markup. See [here](https://stackoverflow.com/help/how-to-ask) for more details – gehbiszumeis Feb 22 '19 at 07:07
  • Help us help you. Post your problem as a code snippet that we can copy/paste into our session and start coding right away. [This is the best resources](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on the internet that shows you just how to do that. – Roman Luštrik Feb 22 '19 at 08:18
  • Hello, thank you for the feedback - this was my first post so I was unsure of how to copy the text into my question and make it "grey"/easy to use. I looked at the links suggested, but I still don't understand how I get my data to be grey (I don't know what markup is!). Do you have some easy instructions for me? I'm sorry that this is such a basic question but I'm trying to learn! – LizzyJ Feb 25 '19 at 03:56
  • There's a button that looks like a curly bracket "{". If you select code/output and click it, it will indent your text with 4 spaces, which makes it look like code. Alternatively, you can simply start your sentences with 4 spaces. – Frans Rodenburg Feb 25 '19 at 04:28
  • Please see the edit in my answer. You have only partially changed your question, leaving the original wording, which no longer applies to the data you present. It also appears to me that these new numbers are proportions... If you simply sum proportions in case of non-zero overlap, I don't believe you will end up with a meaningful number (an overlap of 2.1, means what exactly?). How about you edit your question and start with an introductory paragraph explaining what you want to show? – Frans Rodenburg Feb 26 '19 at 08:17

1 Answers1

0

Edit:

It now seems you are trying to sum the contents of columns, where those columns are greater than zero. This is confusing, because the first part of your question no longer applies (0 = absent, 1 = present). Hence, I have left your original data and the solution to the original problem down below. If you clarify what it is exactly you want, I can clear up the answer as well.

The matrix you now have in your question could be obtained as follows:

M <- structure(c(0.5, 0.3, 0.25, 0.5, 0.3, 0.25, 0, 0.3, 0.25, 0, 0, 0.25), .Dim = 3:4, 
          .Dimnames = list(c("Species1", "Species2", "Species3"), 
                           c("AreaA", "AreaB", "AreaC", "AreaD")))

Shared <- matrix(0, nrow = ncol(M), ncol = ncol(M))
rownames(Shared) <- colnames(M)
colnames(Shared) <- colnames(M)
for(i in 1:ncol(M)){
    Shared[i, -i] <- apply(M[, -i], 2, function(x){sum(pmin(M[, i] + x)[M[, i] > 0 & x > 0])})
}

> print(Shared)
      AreaA AreaB AreaC AreaD
AreaA   0.0   2.1   1.1   0.5
AreaB   2.1   0.0   1.1   0.5
AreaC   1.1   1.1   0.0   0.5
AreaD   0.5   0.5   0.5   0.0

Old Answer

This solution sums the number of present species among areas:

M <- matrix(c(1,1,0,0,
              1,1,0,0,
              0,0,1,1), nrow = 3, byrow = TRUE)

colnames(M) <- paste0("Area", LETTERS[1:4])
rownames(M) <- paste0("Species", 1:3)

Shared <- matrix(0, nrow = ncol(M), ncol = ncol(M))
rownames(Shared) <- colnames(M)
colnames(Shared) <- colnames(M)
for(i in 1:ncol(M)){
  Shared[i, -i] <- apply(M[, -i], 2, function(x){sum(M[, i] == 1 & x == 1)})
}

If you only want to display the upper triangular, simply do this:

Shared[lower.tri(Shared)] <- '' # or NA if you want the numbers to stay numbers
print(Shared)

> print(Shared)
      AreaA AreaB AreaC AreaD
AreaA "0"   "2"   "0"   "0"  
AreaB ""    "0"   "0"   "0"  
AreaC ""    ""    "0"   "1"  
AreaD ""    ""    ""    "0"  

If you're just trying to find areas with larger overlap, you can also simply use a distance function instead (e.g. dist(t(M), method = "manhattan")).

Frans Rodenburg
  • 476
  • 6
  • 17
  • 1
    Hi Frans, Thank you so much for your help and time, this is terrific. Many thanks!! – LizzyJ Feb 25 '19 at 04:03
  • You're welcome! If this solves your issue, you can accept my answer under the up/downvote. – Frans Rodenburg Feb 25 '19 at 04:26
  • Thanks Frans. Is it possible to modify this code so that instead of recording the number of times a species is shared (i.e. if it is found in area A [=1] and B [=1] in the matrix it gets a value of 1) to simply adding the values (i.e. it is found in area A [=1] and B [=1] so in the matrix it gets a value of 2? – LizzyJ Feb 26 '19 at 00:57
  • Hi @LizzyJ, could you edit your question to include an example of what you want the end result to look like? – Frans Rodenburg Feb 26 '19 at 03:23
  • Thanks Frans, please see edited example above. Thanks for your time and help! – LizzyJ Feb 26 '19 at 04:06