XY Plotting column data grouped on other column

Question

I have a very large data set with two columns which relate as below.

df <- data.frame(
  group = c("123-4", "123-4", "234-5", "234-5", "345-6", "345-6"),
  age = c(38, 41, 65, 67, 78, 23))

group      age
123-4 38
123-4 41
234-5 65
234-5 67
345-6 78
345-6 23

I want to be able to plot the ages for each group against each other. I can do it by pulling min and max values of each group out but I want to maintain the randomness of my xy instead of having all the min values x and all the max values y. Seems this should be very easy but I am beating head against the proverbial wall.

Would you find useful something like this ? https://stackoverflow.com/questions/41764818/ggplot2-boxplots-with-points-and-fill-separation — AntoniosK, Nov 26 '18 at 17:09
A scatter plot would also be possible, but I'm not sure it's a good approach. It really depends on the nature of your grouping (`group` variable) and whether it makes sense to apply some kind of ordering. — AntoniosK, Nov 26 '18 at 17:13
I'm unclear on what type of visual you want. Are you trying to show the distribution of ages within groups? Like a beeswarm or jittered scatter plot? — camille, Nov 26 '18 at 17:16
I want to use a scatterplot. Most of these pairs will congregate about a pretty linear center but I want to make the outliers stand out more by not plotting min() and max() Ordering is irrelevant in this case, the "group" is just assigned numbers and has no order. — Bruce, Nov 26 '18 at 17:48
all the "groups" will only have two member. I want to pull out visually those groups that have a greater age difference than is typical. — Bruce, Nov 26 '18 at 17:49

MrFlick · Answer 1 · 2018-11-26T17:31:56.317

We can write a helper function to exact a value for each group.

group_val <- function(values, groups, index=1) tapply(values, groups, `[`, index)

For example

with(df, group_val(age, group, 1))
# 123-4 234-5 345-6 
#    38    65    78 
with(df, group_val(age, group, 2))
# 123-4 234-5 345-6 
#    41    67    23

Then you could do

plot(group_val(df$age, df$group, 1), group_val(df$age, df$group, 2))
# or plot(group_val(age, group, 2) ~ group_val(age, group, 1), df)

Though the more usual way to handle this would be to reshape your data from long to wide. There are plenty of other questions on this site about that task. But if you want to use gpplot you'd have to do it that way. For example

library(mutate)
library(tidyr)
library(ggplot2)
df %>% group_by(group) %>% 
  mutate(seq = letters[1:n()]) %>% 
  spread(seq, age) %>% 
  ggplot(aes(a,b)) + geom_point()

Thank you for the edit, I jumped back on to do that and you beat me to it. — Bruce, Nov 26 '18 at 17:22

score 0 · Answer 2 · answered Nov 26 '18 at 19:02

0

Mr.Flicker nailed it with the right idea, long to wide. Easy fix as I knew it should be but too new to figure out

wide <- as.data.frame((t(unstack(df,age~group))))

answered Nov 26 '18 at 19:02

Bruce

113
8

XY Plotting column data grouped on other column

2 Answers2