I'm having difficulty with a few outliers making the color scale useless.
My data has a Length variable that is based in a range, but will usually have a few much larger values. The below example data has 95 values between 500 and 1500, and 5 values over 50,000. The resulting color legends tend to use 10k, 20k, ... 70k for the color changes when I want to see color changes between 500 and 1500. Really, anything over around 1300 should be the same solid color (probably median +/- mad), but I don't know where to define that.
I'm open to any ggplot solution, but ideally lower values would be red, middle white, and higher blue (low is bad). In my own dataset, date is an actual date with as.POSIXct() in the ggplot aes(), but doesn't seem to affect the example.
#example data
date <- sample(x=1:10,size=100,replace=T)
stateabbr <- sample(x=1:50,size=100,replace=T)
Length <- c(sample(x=500:1500,size=95,replace=T),60000,55000,70000,50000,65000)
x <- data.frame(date=date,stateabbr=stateabbr,Length=Length)
#main plot
(g <- ggplot(data=x,aes(x=date,y=factor(stateabbr))) +
geom_point(aes(color=as.numeric(as.character(Length))),alpha=3/4,size=4) +
#scale_x_datetime(labels=date_format("%m/%d")) +
opts(title="Date and State") + xlab("Date") + ylab("State"))
#problem
g + scale_color_gradient2("Length",midpoint=median(x$Length))
Adding trans="log" or "sqrt" doesn't quite do the trick either.
Thank you for your help!