13

An example dataset:

A <- c('a','b', 'c','d','e')
types <- factor(A)
B <- c(1,2,3,4,5)
C <- c(6,7,8,9,10)
D <- c(1,2,1,2,3)
ABC <- data.frame(B,C,D,types)

library(ggplot2)

ggplot(ABC, aes(x=B ,y=C ,size=D, colour=as.factor(types),label=types, shape=as.factor(types))) +
geom_point()+geom_text(size=2, hjust=0,colour="black", vjust=0) +
scale_size_area(max_size=20, "D", breaks=c(100,500,1000,3000,5000))  +
scale_x_log10(lim=c(0.05,10),breaks=c(0.1,1,10))+ scale_y_continuous(lim=c(0,30000000)) +
scale_shape_manual(values=c(15,18,16,17,19))`

Plotting this you will there are factors a-e that have colours and shapes attributed to them.

In my code I use scale_shape_manual to set the shapes and they are defined by sequence i.e. the order of factors is a,b,c,d,e and my values are 15,18,16,17,19 so a=15 (a square), b=18 etc etc

I would like to set these shapes by factor. My data will be changing each day and the factors will be in different orders but I always want the same factors to have the same shapes.

So obviously this code doesn't work but something like:

scale_shape_manual(values=('a'=15, 'b'=18, 'c'=16, 'd'=17, 'e'=19))

Would be helpful if I could do the same for colour too.

user438383
  • 5,716
  • 8
  • 28
  • 43
Oli
  • 532
  • 1
  • 5
  • 26

2 Answers2

11

If I'm understanding you correctly, there will always be (at most) the five categories "a" - "e", and you want the shapes and colors for these to be consistent across datasets. Here is one way (note: gg_color_hue(...) is from here):

# set up shapes
shapes <- c(15,18,16,17,19)
names(shapes) <- letters[1:5]

# set up colors
gg_color_hue <- function(n) { # ggplot default colors
  hues = seq(15, 375, length=n+1)
  hcl(h=hues, l=65, c=100)[1:n]
}
colors <- gg_color_hue(5)
names(colors) <- names(shapes)

# original data
ggplot(ABC, aes(x=B ,y=C ,size=D, colour=types,label=types, shape=types)) +
  geom_point()+geom_text(size=2, hjust=0,colour="black", vjust=0) +
  scale_size_area(max_size=20, "D", breaks=c(100,500,1000,3000,5000))  +
  scale_x_log10(lim=c(0.05,10),breaks=c(0.1,1,10))+ 
  scale_y_continuous(lim=c(0,30000000)) +
  scale_shape_manual(values=shapes) + scale_color_manual(values=colors)

#new data
DEF <- data.frame(B,C,D,types=factor(c("a","a","a","d","e")))
ggplot(DEF, aes(x=B ,y=C ,size=D, colour=types,label=types, shape=types)) +
  geom_point()+geom_text(size=2, hjust=0,colour="black", vjust=0) +
  scale_size_area(max_size=20, "D", breaks=c(100,500,1000,3000,5000))  +
  scale_x_log10(lim=c(0.05,10),breaks=c(0.1,1,10))+ 
  scale_y_continuous(lim=c(0,30000000)) +
  scale_shape_manual(values=shapes) + scale_color_manual(values=colors)

Community
  • 1
  • 1
jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • Thanks, although no the number of factors will fluctuate between 17-19, this is why using the sequence way of doing it wasn't going to work as when one factor isn't included then whole sequence is thrown off. – Oli Oct 06 '14 at 15:28
  • No it is not. In the second example, factors "b" and "c" are missing but the sequence is *not* thrown off; "a" is still square, "d" is still triangle, and "e" is still circle. Are you seriously going to use 19 shapes?? – jlhoward Oct 06 '14 at 15:31
  • ^I meant in my original attempt the seq was thrown off. No, I plan on using only the shape values 15,16 and 18. I want to set two of my factors as 16 specifically then the others can been 15,16 or 18. Using a few shapes as well as different colours should be enough variance for each factor. – Oli Oct 06 '14 at 15:40
  • 1
    So does this answer your question?? – jlhoward Oct 06 '14 at 15:41
  • I think so, let me quickly see if it works. Im not using a data frame, im reading from a csv and the factors are values in a column. I think it might work but let me check first – Oli Oct 06 '14 at 15:45
  • you use names(shapes) <- letters[1:5], instead of letters my actual data have names but when I tried to use the names I get Error: unexpected ',' in "names(shapes) <- name1," – Oli Oct 06 '14 at 16:00
  • It needs to be a character *vector*, something like `c("name1","name2",...)` – jlhoward Oct 06 '14 at 16:02
  • shapes <-c(15,18,16,16,15,18,15,16,18,16,15,16,18,15,16,18,15,18,16) names(shapes) <- c(data$name1, data$name2, ....) There were no shapes when I did this, blank plot – Oli Oct 06 '14 at 16:10
  • Are there 19 columns, as is `df$name1` through `df$name19`? If so, then `names(shapes) <- paste0("name",1:19)` – jlhoward Oct 06 '14 at 16:15
  • No theres a coloumn called names with 1236 rows. I tried length(data$name) but obviously that just gave me the length and the rest of the names were called NA. Of those 1236 rows there are only 19 unique names can I use as.factor(data$name) – Oli Oct 06 '14 at 16:19
  • names(shapes) <- test$name Error in names(shapes) <- data$name : 'names' attribute [1249] must be the same length as the vector [19] – Oli Oct 06 '14 at 16:27
  • names(shapes) <- sort(unique(data$name) worked Thanks a bunch :) – Oli Oct 09 '14 at 09:50
  • The days where one of the names doesnt occur changes the names list and thus shifts all the shapes by 1. Is there a way to set the shapes at the start (and on the day the name isnt there to just ignore the fact its not there and thus keep the same order)?? – Oli Oct 09 '14 at 10:27
  • nvm, used data.frame and then write.table to add the name (with 0 for each other value) to the end of each csv thus adding it to the lists where it isn't and making my script work :D – Oli Oct 09 '14 at 15:32
6

I'm certain this is no longer relevant for the OP but the best solution I found to this problem is simpler than what is currently posted and is almost written into the question itself.

The OP's wish of assigning a manualy defined shape or colour using something like
"scale_shape_manual(values=('a'=15, 'b'=18, 'c'=16, 'd'=17, 'e'=19))"
only requires the assignments to be passed as a vector as in,
scale_shape_manual(values = c('a'=15, 'b'=18, 'c'=16, 'd'=17, 'e'=19))

jlhoward's answer is better if you want autogenerated colours. Whereas the script I offer bellow requires fewer lines of code. Users choice.

A <- c('a','b', 'c','d','e')
types <- factor(A)
B <- c(1,2,3,4,5)
C <- c(6,7,8,9,10)
D <- c(1,2,1,2,3)
ABC <- data.frame(B,C,D,types)

library(ggplot2)

ggplot(ABC, aes(x=B ,y=C ,size=D, colour=as.factor(types),label=types, shape=as.factor(types))) +
geom_point()+geom_text(size=2, hjust=0,colour="black", vjust=0) +
scale_size_area(max_size=20, "D", breaks=c(100,500,1000,3000,5000))  +
scale_x_log10(lim=c(0.05,10),breaks=c(0.1,1,10))+
scale_y_continuous(lim=c(0,30000000)) +
scale_shape_manual(values = c('a'=15, 'b'=18, 'c'=16, 'd'=17, 'e'=19)) +
scale_colour_manual(values = c('a'="tomato", 'b'="yellow4", 'c'="palegreen2", 'd'="deepskyblue1", 'e'="orchid3"))`
Dylan S.
  • 359
  • 4
  • 15