0

My question relates to this article by Davis and Chen (2006), in which it is shown a way to visualise Kendall's tau measure of non-parametric correlation between two variables.

Given a number of datapoints in a scatterplot, each point is connected to all the other points by a line segment. A line segment can be of different colours following these criteria:

  1. line segment is black if its slope is positive;
  2. line segment is red if its slope is negative;
  3. line segment is blue is its slope is 0 (horizontally flat line);
  4. line segment is black as in 1. if its slope is undefined (vertical line).

Here is an example from the original article:

enter image description here

My problem is that I can generate a scatterplot, but not the line segments that connect all possible pairs of points, changing colour depending on the criteria above.

Here is an example of dataset:

dataset <- dplyr::tibble(alpha = c(1, 5, 7, 8, 9, 10, 11, 12), 
              beta =  c(7, 7, 5, 4, 3, 14, 15, 18))

I can generate this:

ggplot2::ggplot(dataset, aes(x = alpha, y = beta)) + geom_point()

enter image description here

but not this:

enter image description here

NOTE. The solution has to be generalisable to a dataset with a large number of datapoints (~1000)

  • 2
    [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data and all necessary code. Right now this is too broad for SO – camille Feb 25 '20 at 14:49
  • What have you tried? I imagine that given the set of points, your first task would be to use `expand.grid` (or `tidyr::complete` or `data.table::CJ`) to assign the endpoints and calculate each slope, then using `aes(..., color=slope)` and, if desired, use `scale_color_manual`. – r2evans Feb 25 '20 at 14:49
  • Hi camille and r2evans. Thanks for your suggestions. I have reformulated my question adding example dataset, and code to show what I am currently able to achieve. – Francesco Cabiddu Feb 25 '20 at 19:40

1 Answers1

1

There's many ways, but you need to build your own data.frame of segments. E.g.

library(tidyverse)

pd <- dataset %>% 
  mutate(d = map(row_number(), function(x) slice(., -x) %>% rename(x = alpha, y = beta))) %>% 
  unnest(d) %>% 
  mutate(
    slope = (y - beta) / (x - alpha),
    cat = case_when(
      is.infinite(slope) | slope > 0 ~ 'a', 
      slope < 0 ~ 'b',
      slope == 0 ~ 'c'
    )
  )

ggplot() +
  geom_segment(aes(alpha, xend = x, beta, yend = y, color = cat), pd) +
  geom_point(aes(alpha, beta), dataset) +
  scale_color_manual(values = c(a = 'black', b = 'red', c = 'blue'))

enter image description here

Axeman
  • 32,068
  • 8
  • 81
  • 94