1

In my case, there are 100 unique (X, Y) points with each having an ID and belongs a Type. In these 100 points, 20 points have values for three other Types (CT,D,OP).

Here is the data generation process:

df <- data.frame(X=rnorm(100,0,1), Y=rnorm(100,0,1), 
                 ID=paste(rep("ID", 100), 1:100, sep="_"),
                 Type=rep("ID",100),
                 Val=c(rep(c('Type1','Type2'),30),
                       rep(c('Type3','Type4'),20)))

Randomly selected 20 points (sample(1:100,20)) will have values which add extra information to the points. All these 20 points in this extra Type will have information in Type=="ID".

dat1 <- data.frame(Type=rep('CT',20),
                   Val=paste(rep("CT", 20), 
                             sample(1:6,20,replace=T), sep="_"))
dat1 <- cbind(df[sample(1:100,20),1:3],dat1)

dat2 <- data.frame(Type=rep('D',20),
                   Val=paste(rep("D", 20), 
                             sample(1:6,20,replace=T), sep="_"))
dat2 <- cbind(df[sample(1:100,20),1:3],dat2)

dat3 <- data.frame(Type=rep('OP',20),
                   Val=paste(rep("OP", 20), 
                             sample(1:6,20,replace=T), sep="_"))
dat3 <- cbind(df[sample(1:100,20),1:3],dat3)

df <- rbind(df, dat1, dat2, dat3)

Now, plotting the points having D_1,D_4 values for Type=="D".

df %>% filter(Val %in% c('D_1','D_4')) %>% 
  ggplot(aes(X,Y,col=Val)) + geom_point() + geom_text(aes(label=ID))

enter image description here

Note: I have added IDs geom_text(aes(label=ID)) only for illustartion purposes.

To this, existing plot, I have to add remaining 92 points which do not have above two values or no values at all. I have tried adding additional points to an existing approach mentioned by Hadley here:

p <- df %>% filter(Val %in% c('D_1','D_4')) %>% ggplot(aes(X,Y,col=Val)) + geom_point() 

p + geom_point(data=df[(!df$ID %in% df$ID[df$Val %in% c('D_1','D_4')]) & df$Type=="ID",],
               colour="grey")

enter image description here

Questions:

  1. How to plot selected points and additional points in a single command or in an elegant way possible?

  2. Is there any possible dplyr approach which can be used in above command?

update: df$Type=="ID" is very important as it allows plotting of the remaining points only once. Otherwise, some of these points having values in either CT or D or OP leads to duplicated plotting.

df %>% count(X,Y) %>% arrange(desc(n))
# # A tibble: 100 x 3
#             X          Y     n
#         <dbl>      <dbl> <int>
# 1 -0.86266147  2.0368626     4
# 2 -0.61770678  0.4428537     4
# 3  1.32441957 -0.9388095     4
# 4 -1.65650319 -0.1551399     3
# 5 -0.99946809  1.1791395     3
# 6 -0.52881072  0.1742483     3
# 7 -0.25892382  0.1380577     3
# 8 -0.19239410  0.5269329     3
# 9 -0.09709764 -0.4855484     3
# 10 -0.05977874  0.1771422     3
# # ... with 90 more rows

Looks like, first three points with the same X, Y values have values for Type ID, CT, D, OP. But these points need to be plotted only once.

Prradep
  • 5,506
  • 5
  • 43
  • 84
  • 1
    Name the data parameter, don't rely on positional matching there. – Roland Aug 27 '17 at 17:41
  • @Roland Thanks, edited the question accordingly. – Prradep Aug 27 '17 at 17:49
  • Perhaps I'm missing something here, but couldn't you create a factor with only the desired values in `levels`. Values not specified in `levels` will become `NA`, which in turn are plotted as "grey" by default. `ggplot(mtcars, aes(x = hp, y = mpg, col = factor(carb, levels = 1:2))) + geom_point()` – Henrik Aug 27 '17 at 18:12
  • In brief, There are 100 unique points but the dataframe `df` has 160 rows. The remaining 60 rows are values for extra three types. – Prradep Aug 27 '17 at 18:39
  • Is there any way to restrict plotting of points more than once? Some points will be plotting again and again which might affect colored points – Prradep Aug 27 '17 at 19:03

2 Answers2

2

Updated Answer

To address the first comment: Some points are plotted more than once because there are multiple rows in the data with the same X Y coordinates. You can remove duplicate points using the code below. We first order the points based on the ordering of Val so that the duplicates will come from Other points, rather than from D_1 or D_4 points (though if your real data contains cases where a D_1 and a D_4 point have the same X and Y coordinates, only the D_1 point will be plotted).

ggplot(df %>% 
         mutate(Val=fct_other(Val,keep=c("D_1","D_4"))) %>% 
         arrange(Val) %>% 
         filter(!duplicated(.[,c("X","Y")])), 
       aes(X,Y,col=Val, size=Val)) + 
  geom_point() +
  scale_colour_manual(values=c(D_1=hcl(15,100,65),D_4=hcl(195,100,65),Other="grey70")) +
  scale_size_manual(values=c(D_1=3, D_4=3, Other=1)) +
  theme_bw() 

enter image description here

If you want to plot all D_1 and D_4 points, even if they have the same X and Y coordinates, you could do this:

df %>% 
   mutate(Val=fct_other(Val,keep=c("D_1","D_4"))) %>% 
   arrange(X, Y, Val) %>% 
   filter((c(1,diff(X)) != 0 & c(1, diff(Y)) !=0) | Val != 'Other')

Then you could use different point marker sizes to ensure that overplotted D_1 and D_4 points are both visible.

Original Answer

What about collapsing all the other levels of Val like this:

library(tidyverse)
library(forcats)

ggplot(df %>% mutate(Val=fct_other(Val,keep=c("D_1","D_4"))), aes(X,Y,col=Val)) + 
  geom_point() +
  scale_colour_manual(values=c(D_1=hcl(15,100,65),D_4=hcl(195,100,65),Other="grey70")) +
  theme_bw()

enter image description here

You could also use size to make the desired points stand out more. For this particular data set, this approach also ensures that we can see a couple of D_1 and D_4 points that were hidden behind grey points in the previous plot.

ggplot(df %>% mutate(Val=fct_other(Val,keep=c("D_1","D_4"))), aes(X,Y,col=Val, size=Val)) + 
  geom_point() +
  scale_colour_manual(values=c(D_1=hcl(15,100,65),D_4=hcl(195,100,65),Other="grey70")) +
  scale_size_manual(values=c(D_1=3, D_4=3, Other=1)) +
  theme_bw()

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Excellent suggestion. I have few comments(Not sure if I'm correct). Some points, approximately 52 points are plotted more than once (i.e., twice or thrice). It can be observed in the red point in the second plot where the point is plotted again on the top of a red point in bigger size. Do you see that? Would you see other approaches where all the 100 points are plotted only once either colored or not? – Prradep Aug 27 '17 at 18:54
  • Would you be interested in answering the [question](https://stackoverflow.com/questions/46093719/layered-plotting-of-selected-points-in-an-efficient-way)? – Prradep Sep 07 '17 at 10:12
0

Building a bit on eipi10's post

library(tidyverse)
library(forcats)
theme_set(theme_bw(base_size=12)+ 
        theme(panel.grid.major = element_blank(),
              panel.grid.minor = element_blank()))

df <- data_frame(X=rnorm(100,0,1), Y=rnorm(100,0,1), 
             ID=paste(rep("ID", 100), 1:100, sep="_"),
             Type=rep("ID",100),
             Val=c(rep(c('Type1','Type2'),30),
                   rep(c('Type3','Type4'),20)))

f.df <- function(x, type){
          type = deparse(substitute(type))
          df[sample(1:100,20), 1:3] %>% 
          mutate(Type=rep(type, 20),
                 Val=paste(rep(type, 20),
                     sample(1:6,20, replace=T), sep="_"))
}

dat1 <- f.df(df, CT)
dat2 <- f.df(df, D)
dat3 <- f.df(df, OP)

df2 <- bind_rows(df, dat1, dat2, dat3)

df2 %>% 
   mutate(Group = fct_other(Val,keep=c("D_1","D_4"))) %>% 
   ggplot(aes(X,Y,color=Group)) + geom_point() +
   scale_colour_manual(values=c(D_1=hcl(15,100,65),D_4=hcl(195,100,65), 
                       Other="grey70")) 
B Williams
  • 1,992
  • 12
  • 19