I am trying to produce a 2-d density plot overlayed on a scatterplot in ggplot2.
I have the following working code:
plt<-ggplot(data=for_plot,aes(x=X, y=Y))+
stat_density2d(aes(fill=..level..,alpha=..level..),geom='polygon',colour='black') +
scale_fill_continuous(low="green",high="red") +
guides(alpha="none") +
ylim(0.5,max(shortest_path_list$shortest_path)) +
geom_point()
When I run the code with this dataset:
> for_plot[sample(nrow(for_plot), 20), ]
Y X
1: 2 110182.549
2: 3 95202.283
3: 2 91557.371
4: 1 6730.598
5: 1 7396.081
6: 1 13939.701
7: 2 9767.561
8: 3 101597.449
9: 2 99368.467
10: 3 102024.722
11: 3 90491.076
12: 3 81337.624
13: 1 5956.710
14: 3 95160.149
15: 3 89981.055
16: 1 8823.615
17: 1 10717.879
18: 2 11463.036
19: 2 3864.292
20: 2 10351.874
It works fine, and gives me the following output:
Note that my Y is discrete and X is continuous, so the plot is fine.
However, when I use this dataset:
> for_plot[sample(nrow(for_plot), 20), ]
Y X
1: 1 9897.476
2: 2 2350.191
3: 1 13911.780
4: 1 98885.336
5: 1 94776.873
6: 1 102804.832
7: 1 99956.988
8: 1 13941.653
9: 1 9246.795
10: 1 13152.775
11: 1 113325.680
12: 1 82263.657
13: 1 91108.347
14: 1 8823.797
15: 1 11057.255
16: 1 99150.825
17: 2 7312.730
18: 2 6476.152
19: 1 113534.588
20: 1 91311.834
I get the following error and the plot:
Warning message:
Computation failed in `stat_density2d()`:
bandwidths must be strictly positive
I know one of the ways of causing this error is usually if there is no variance in either X or Y direction. But, in this case there seems to be variation similar to the first case. I am hence not understanding what makes the first scenario work, but the second to fail. Is there a work around to get the contours in the second scenario?
Here are 2 scenarios with the minimal reproducible example as suggested by Mr. Flick:
Scenario 1 (the plot works):
set.seed(100)
> for_plot<-dput(for_plot[sample(nrow(for_plot), 20), ])
structure(list(Y = c(2, 2, 3, 1, 2,
3, 3, 3, 2, 1, 3, 2, 2, 3, 1, 3, 2, 3, 2, 1), X = c(96649.7975713206,
104758.02495167, 93351.5907987183, 5535.8146932624, 99480.6016841293,
113103.505637801, 90445.3465777551, 81903.811792781, 106832.148472597,
6576.45291001145, 99368.9134426028, 111130.390217174, 9471.82883910966,
102087.415882298, 5657.05900168211, 107688.549964059, 103669.855375872,
94121.8586312176, 1573.00051813297, 7394.05750749363)), .Names = c("Y", "X"), class = c("data.table",
"data.frame"), row.names = c(NA, -20L), .internal.selfref = <pointer: 0x00000000065c0788>)
Scenario 2 (The plot does not produce desired output):
> for_plot<-dput(for_plot[sample(nrow(for_plot), 20), ])
structure(list(Y = c(1,
1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2),
X = c(96925.0119740431, 98869.1560687514, 99434.7995468473,
9123.65901167288, 111471.920587976, 109448.280478224, 6678.04323546572,
98309.4525934759, 91311.834287723, 86616.727265815, 101009.644050382,
7396.08053430818, 102517.086739334, 11504.3148787722, 9471.82883910966,
15427.4786153589, 96385.4989659007, 2249.38197350042, 91425.5491534976,
9303.7114788096)), .Names = c("Y",
"X"), class = c("data.table", "data.frame"), row.names = c(NA,
-20L), .internal.selfref = <pointer: 0x00000000065c0788>)
The error:
Warning message:
Computation failed in `stat_density2d()`:
bandwidths must be strictly positive
Update
One way of getting the kernels to work, is to add some random noise to the Y variable so that the variance is no longer 0.
#Add variability for kernel density
rand_noise<-runif(nrow(for_plot), -0.1, 0.1)
for_plot$Y_noise<-for_plot$Y+rand_noise
Though the error goes away and kernels are produced, they are not nice and uniform like the scenario 1:
As, I have mentioned in the comments, what really baffles me is why scenario I always work by default and scenario 2 never works by default. I have tried with different subsets of the data to verify this. The nature of the data is same in both scenario 1 and scenario 2.