0

I am trying to create scatter plots of all the combinations for the columns: insulin, sspg, glucose (mclust, diabetes dataset, in R) with class as the colo(u)r. By that I mean insulin with sspg, insulin with glucose and sspg with glucose.

And I would like to do that with tidyverse, purrr, mappings and pipe operations. I can't quite get it to work, since I'm relatively new to R and functional programming.

When I load the data I've got the columns: class, glucose, insulin and sspg. I also used pivot_longer to get the columns: attr and value but I was not able to plot it and don't know how to create the combinations.

I assume that there will be an iwalk() or map2() function at the end and that I might have to use group_by() and nest() and maybe combn(., m=2) for the combinations or something like that. But it will probably have some way simpler solution that I can not see myself.

My attempts have amounted to this:

library(mclust) 
library(dplyr) 
library(tibble)
data(diabetes)

diabTib <- tibble::as_tibble(diabetes)

plot <- diabTib %>%  
  pivot_longer(cols = !c(class), names_to = "attr", values_to = "value") %>% 
  group_by(attr) %>% 
  nest() 

At the end there should be three plots on the screen when I execute plot or during the pipeline as a side effect(?).

Help would be appreciated.

AndrewGB
  • 16,126
  • 5
  • 18
  • 49
  • Will these help? https://stackoverflow.com/questions/56242195/automating-multiple-plots-graphs-with-two-ys-from-one-data-set/56246724#56246724 & https://stackoverflow.com/questions/54664092/passing-labels-to-xlab-and-ylab-in-ggplot2/54701949#54701949 & https://stackoverflow.com/questions/50520885/creating-multiple-graphs-based-upon-the-column-names/50522928#50522928 – Tung Jun 15 '21 at 16:56

2 Answers2

1

You can actually get this pretty easily with base::plot.

# load data
diabetes <- mclust::diabetes 

# define vector of colors based on class in order of cases in dataset
colors <- c("Red", "Green", "Blue")[diabetes$class]

# make pair-wise scatter plot of desired variables colored based on class
plot(diabetes[,-1], col = colors)

Created on 2021-06-15 by the reprex package (v2.0.0)

Dan Adams
  • 4,971
  • 9
  • 28
0
library(mclust)
#> Package 'mclust' version 5.4.7
#> Type 'citation("mclust")' for citing this R package in publications.
library(tidyverse)
data("diabetes")

there are many ways to do this, I would probably start with this way as it is easier to understand and you get only as many plots as there are combinations

tbl <- tibble::as_tibble(diabetes)
combn(setdiff(names(tbl),"class"),2, simplify = F) %>% #get combinations as vectors
  map(~ggplot(tbl, aes_string(.x[[1]], .x[[2]], color = "class")) + geom_point())
#> [[1]]

#> 
#> [[2]]

#> 
#> [[3]]

If you want to plot all combinations but in a single figure, you will then need tidyr. This is how I would do that calculation

tbl2 <- tbl %>%
  pivot_longer(cols = -class, names_to = "attr", values_to = "value") %>%
  nest_by(attr) %>% {
    d <- tidyr::expand(., V1 = attr, V2 = attr) # expand combinations
    #rename the nested data to avoid naming conflicts
    d <- left_join(d, mutate(., data = list(rename_with(data, .fn = ~paste0(.x,"_x")))), by = c("V1"="attr"))
    d <- left_join(d, mutate(., data = list(rename_with(data, .fn = ~paste0(.x,"_y")))), by = c("V2"="attr"))
    d
  } %>%
  unnest(c(data.x, data.y))

ggplot(tbl2, aes(x = value_x, y = value_y, color = class_x)) +
  geom_point() +
  facet_grid(rows = vars(V1), cols = vars(V2))

Created on 2021-06-15 by the reprex package (v2.0.0)

Justin Landis
  • 1,981
  • 7
  • 9