6

what are some good kriging/interpolation idea/options that will allow heavily-weighted points to bleed over lightly-weighted points on a plotted R map?

the state of connecticut has eight counties. i found the centroid and want to plot poverty rates of each of these eight counties. three of the counties are very populated (about 1 million people) and the other five counties are sparsely populated (about 100,000 people). since the three densely-populated counties have more than 90% of the total state population, i would like those the three densely-populated counties to completely "overwhelm" the map and impact other points across the county borders.

the Krig function in the R fields package has a lot of parameters and also covariance functions that can be called, but i'm not sure where to start?

here is reproducible code to quickly produce a hard-bordered map and then three differently-weighted maps. hopefully i can just make changes to this code, but perhaps it requires something more complex like the geoRglm package? two of the three weighted maps look almost identical, despite one being 10x as weighted as the other..

https://raw.githubusercontent.com/davidbrae/swmap/master/20141001%20how%20to%20modify%20the%20Krig%20function%20so%20a%20huge%20weight%20overwhelms%20nearby%20points.R

thanks!!

hard-bordered connecticut map with county labels

example weighted map - fairfield, hartford, and new haven should overwhelm all other counties


edit: here's a picture example of the behavior i want-

enter image description here

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Anthony Damico
  • 5,779
  • 7
  • 46
  • 77
  • 1
    It's not always the case that the county seat will be the center of population on a county vasis, but it would have been in the case of Connecticut counties. (That hand coded example looked "just wrong" to this former Connecicut resident.) The cities of Hartford, New Haven, and Fairfield would be a better "center". – IRTFM Oct 01 '14 at 14:36
  • @BondedDust oh for sure! i am just using to get the general method correct before i calculate the population-weighted centroid of each county based a much smaller geography like census tracts ;) – Anthony Damico Oct 01 '14 at 14:45
  • 2
    If you are using something like ggplot, set the opacity (alpha) of the point based on weight (or -weight). – Ryan Hope Oct 01 '14 at 14:45
  • @RyanHope thanks! could you give an example as an answer? note my example weighted map is not anything close to a scatterplot.. – Anthony Damico Oct 01 '14 at 15:12
  • This'll be a lot more work, but personally I like maps whose borders (counties in your case) are physically distorted to reflect populations or other parameters. – Carl Witthoft Oct 01 '14 at 15:31
  • @CarlWitthoft, there apparently is a package to make cartograms: http://stackoverflow.com/a/9320567/3897439. Or you can use ScapeToad as in this post (http://spatial.ly/2013/06/r_activity/). – Cotton.Rockwood Oct 04 '14 at 17:11
  • Giving a little more information about the context and your purpose might help. Creating a smoothed surface of poverty level based on county centroids doesn't make too much sense if you want to represent the real world. If you still want to use krigging, taking a look at this may help, though: http://resources.arcgis.com/en/help/main/10.1/index.html#/How_Kriging_works/009z00000076000000/. You probably want to play with the sill and nugget values. – Cotton.Rockwood Oct 04 '14 at 19:12
  • hi @Cotton.Rockwood, thanks for your detailed answer! i agree it might not make practical sense on county-centroids. i'm only using counties to make my example a minimal one. i ultimately hope to apply this to census tracts- and let densely-populated ones overwrite sparely-populated ones :) – Anthony Damico Oct 05 '14 at 08:59
  • @AnthonyDamico whether at the county or census block scale, I don't think the weighting makes sense... unless you think that estimates of poverty are less accurate with lower population density for some reason. Perhaps a better weighting factor would be the proportion of total population censused. This would then be a measure of representativeness of the census and thus likely relates to accuracy. Does that make sense? – Cotton.Rockwood Oct 05 '14 at 17:57
  • @Cotton.Rockwood whoops, sorry again for oversimplifying my example (and thanks for your continued help) -- in the final code i hope to publish, the weights _are_ going to be measures of precision rather than population weights. (probably the inverted variance but not sure yet) so, in many cases it's like you say: points in rural areas will be less accurate. – Anthony Damico Oct 05 '14 at 19:32
  • From your edit, you seem to want greater weighted values to *completely* overwhelm lesser ones. However, I don't think that will ever happen with krigging unless you set a threshold and exclude low-weighted values from the krigging call entirely. – Cotton.Rockwood Oct 05 '14 at 21:13

3 Answers3

7

disclaimer - I am not an expert on Krigging. Krigging is complex and takes a good understanding of the underlying data, the method and the purpose to achieve the correct result. You may wish to try to get input from @whuber [on the GIS Stack Exchange or contact him through his website (http://www.quantdec.com/quals/quals.htm)] or another expert you know.

That said, if you just want to achieve the visual effect you requested and are not using this for some sort of statistical analysis, I think there are some relatively simple solutions.


EDIT:

As you commented, though the suggestions below to use theta and smoothness arguments do even out the prediction surface, they apply equally to all measurements and thus do not extend the "sphere of influence" of more densely populated counties relative to less-densely populated. After further consideration, I think there are two ways to achieve this: by altering the covariance function to depend on population density or by using weights, as you have. Your weighting approach, as I wrote below, alters the error term of the krigging function. That is, it inversely scales the nugget variance.

enter image description here

As you can see in the semivariogram image, the nugget is essentially the y-intercept, or the error between measurements at the same location. Weights affect the nugget variance (sigma2) as sigma2/weight. Thus, greater weights mean less error at small-scale distances. This does not, however, change the shape of the semivariance function or have much effect on the range or sill.

I think that the best solution would be to have your covariance function depend on population. however, I'm not sure how to accomplish that and I don't see any arguments to Krig to do so. I tried playing with defining my own covariance function as in the Krig example, but only got errors.

Sorry I couldn't help more!

Another great resource to help understand Krigging is: http://www.epa.gov/airtrends/specialstudies/dsisurfaces.pdf


As I said in my comment, the sill and nugget values as well as the range of the semivariogram are things you can alter to affect the smoothing. By specifying weights in the call to Krig, you are altering the variance of the measurement errors. That is, in a normal use, weights are expected to be proportional to the accuracy of the measurement value so that higher weights represent more accurate measurements, essentially. This isn't actually true with your data, but it may be giving you the effect you desire.

To alter the way your data is interpolated, you can adjust two (and many more) parameters in the simple Krig call you are using: theta and smoothness. theta adjusts the semivariance range, meaning that measured points farther away contribute more to the estimates as you increase theta. Your data range is

range <- data.frame(lon=range(ct.data$lon),lat=range(ct.data$lat))
range[2,]-range[1,]
       lon       lat
2 1.383717 0.6300484

so, your measurement points vary by ~1.4 degrees lon and ~0.6 degrees lat. Thus, you can play with specifying your theta value in that range to see how that affects your result. In general, a larger theta leads to more smoothing since you are drawing from more values for each prediction.

Krig.output.wt <- Krig( cbind(ct.data$lon,ct.data$lat) , ct.data$county.poverty.rate ,
                        weights=c( size , 1 , 1 , 1 , 1 , size , size , 1 ),Covariance="Matern", theta=.8)  
r <- interpolate(ras, Krig.output.wt)
r <- mask(r, ct.map)
plot(r, col=colRamp(100) ,axes=FALSE,legend=FALSE)
title(main="Theta = 0.8", outer = FALSE)
points(cbind(ct.data$lon,ct.data$lat))
text(ct.data$lon, ct.data$lat-0.05, ct.data$NAME, cex=0.5)

Gives:

enter image description here

Krig.output.wt <- Krig( cbind(ct.data$lon,ct.data$lat) , ct.data$county.poverty.rate ,
                        weights=c( size , 1 , 1 , 1 , 1 , size , size , 1 ),Covariance="Matern", theta=1.6)  
r <- interpolate(ras, Krig.output.wt)
r <- mask(r, ct.map)
plot(r, col=colRamp(100) ,axes=FALSE,legend=FALSE)
title(main="Theta = 1.6", outer = FALSE)
points(cbind(ct.data$lon,ct.data$lat))
text(ct.data$lon, ct.data$lat-0.05, ct.data$NAME, cex=0.5)

Gives:

enter image description here

Adding the smoothness argument, will change the order of the function used to smooth your predictions. The default is 0.5 leading to a second-order polynomial.

Krig.output.wt <- Krig( cbind(ct.data$lon,ct.data$lat) , ct.data$county.poverty.rate ,
                        weights=c( size , 1 , 1 , 1 , 1 , size , size , 1 ),
                        Covariance="Matern", smoothness = 0.6)  
r <- interpolate(ras, Krig.output.wt)
r <- mask(r, ct.map)
plot(r, col=colRamp(100) ,axes=FALSE,legend=FALSE)
title(main="Theta unspecified; Smoothness = 0.6", outer = FALSE)
points(cbind(ct.data$lon,ct.data$lat))
text(ct.data$lon, ct.data$lat-0.05, ct.data$NAME, cex=0.5)

Gives:

enter image description here

This should give you a start and some options, but you should look at the manual for fields. It is pretty well-written and explains the arguments well. Also, if this is in any way quantitative, I would highly recommend talking to someone with significant spatial statistics know how!

Cotton.Rockwood
  • 1,601
  • 12
  • 29
  • thanks! i did check these out before posting, they do not appear to solve the problem. after turning each of these options on, compare `size<-5` to `size<-100` in my reproducible example. none of these options improves the comparison between `size<-5` and `size<-100`. i am looking for something where a heavily-weighted point visibly encroaches on nearby lightly-weighted points, but less-heavily-weighted points don't encroach as much. (sorry for my layspeak) – Anthony Damico Oct 05 '14 at 09:17
  • @AnthonyDamico, as I understand it, you want the predicted values at lower population measurements to be less tied to the actual measured value? Is that correct? If so, weighting seems to be the correct approach. However, it likely needs to be on the same scale as the variation in your variable of interest: poverty rate. Since poverty rate ranges from 5.8 to 11.4, I'm guessing that specifying weights way above this will not matter. However, if you scale your population to this range, you might have more luck. I'll add a bit more int the answer to see if it helps. – Cotton.Rockwood Oct 05 '14 at 17:53
  • thanks for your edit, cotton! if nothing else, it's helpful to know that what i'm trying to do isn't really possible with kriging alone.. thanks for saving me from wasting a bunch of time on the wrong path ;) – Anthony Damico Oct 06 '14 at 11:44
2

Kriging is not what you want. (It is a statistical method for accurate--not distorted!--interpolation of data. It requires preliminary analysis of the data--of which you do not have anywhere near enough for this purpose--and cannot accomplish the desired map distortion.)

The example and the references to "bleed over" suggest considering an anamorph or area cartogram. This is a map which will expand and shrink the areas of the county polygons so that they reflect their relative population while retaining their shapes. The link (to the SE GIS site) explains and illustrates this idea. Although its answers are less than satisfying, a search of that site will reveal some effective solutions.

Community
  • 1
  • 1
whuber
  • 2,379
  • 14
  • 23
  • does the weighting argument 'allow' the model fit to vary more from low-weight measurements? I'm just curious if my understanding is correct. – Cotton.Rockwood Oct 05 '14 at 21:04
  • @Cotton If you are referring to the arguments of `Krig` in your answer, I can't help you because I cannot locate any `R` library that includes a function of that name. What library are you using? Regardless, weighting in any interpolator (or, generally, statistical prediction procedure) will not accomplish what the OP seems to ask for, which seems closer to something like a weighted kernel density estimate than an actual interpolation. – whuber Oct 05 '14 at 21:13
  • 1
    I am referring to `Krig` in the `fields` package that the OP is using. The reference manual is at: http://cran.r-project.org/web/packages/fields/fields.pdf. I agree it will not fulfill the desired outcome he has described. – Cotton.Rockwood Oct 05 '14 at 21:19
  • thank you for the guidance! this makes sense to me. i think area cartograms aren't what i'm after, since i don't want the final map to be distorted in any way. however, this gives me a few ideas about what direction to go with the end goal being [a solid method to make these](http://stackoverflow.com/questions/23318243/how-to-make-beautiful-borderless-geographic-thematic-heatmaps-with-weighted-sur). thank you!!! – Anthony Damico Oct 06 '14 at 11:48
  • 1
    @Cotton Thank you! The "weights" argument is used to adjust the kriging matrices, which are filled with covariance values. The covariances in this model are sums of three terms: from the "P" and "Z" processes and the measurement errors "e". The weights, which apply only to "e", will be high for precise measurements, low for imprecise ones: "Weights are proportional to the reciprocal variance of the measurement error. ... the variance of e.k is sigma**2/ weights.k." In this application, using reciprocal populations as weights has merit--but it won't accomplish what the OP seeks. – whuber Oct 06 '14 at 16:26
  • @AnthonyDamico, it seems like the maps from the link you mentioned above must just be spatially interpolated and smoothed census (survey) data. Out of curiosity, why is it that you want to overwhelm some points more so that possible with weights? It seems like krigging with weights proportional to some measure of error would give you a good result, especially using the census tract-level data. – Cotton.Rockwood Oct 07 '14 at 02:09
  • @Cotton.Rockwood so i'm writing a general-use blog post (with code!) for http://asdfree.com. many survey data sets have geographic identifiers, none have census-tract-level data and many have _incomplete_ geographies. many are like the [cps](http://www.asdfree.com/search/label/current%20population%20survey%20%28cps%29) which have _some_ counties but not all. so in the connecticut example, you know the three big counties _and_ the average of all other counties. see how litchfield county is isolated? from the survey, our only litchfield info is the average of all five low-population counties :) – Anthony Damico Oct 07 '14 at 06:24
  • @Cotton.Rockwood if a small geographic area is isolated by lots of huge areas, the only thing known about that small area might be its value already-summed with other small geographic areas. so forcing its centroid to be one number is actually probably wrong. whuber made an excellent point- kriging is not for distorted data, but many spatially-applied survey results are going to be distorted because of this rural-aggregation problem. i think i'm going to move forward ignoring it, we'll see how they look.. :) – Anthony Damico Oct 07 '14 at 06:39
  • very cool work! While I see your point and the problem, it seems to me that krigging with weights *is* actually a good fit. After all, even if the data point for locations such as Litchfield are averages of multiple counties, it is still likely to be a better estimate of the real value than no data at all. Thus, when you include spatial relationships as with krigging, as long as you build in an appropriate amount of error for the averaged points, you should get a reasonable estimate. Krigging is often *not* an exact estimator, so it seems an appropriate approach in the circumstance. – Cotton.Rockwood Oct 07 '14 at 07:06
  • ...again, I'm not an expert, but it seems to me that with appropriate krigging methods and some estimate of the error for the averaged points, you should get a pretty good product. – Cotton.Rockwood Oct 07 '14 at 07:08
  • @Cotton I am concerned about more basic problems. (1) The question does not seem to be asking for a good fit: it asks for some kind of hyperbolic representation of the data. (Since the data represent averages or totals within well-defined polygons, making them grossly influence the map in neighboring polygons is not going to be a good fit.) (2) Kriging methods cannot be applied with any reasonable level of confidence when only eight data values are available. (3) That is especially so when gross approximations have been made in locating the data supports and the data have variable supports. – whuber Oct 07 '14 at 14:28
  • whuber, I guess my question to @Anthony Damico was, ***why*** is he looking for something other than a good fit when using weighted krigging on census tract (or even county) centroids would seem to be a reasonable approach to a relatively accurate solution. It seems like that should be what he is shooting for based on the maps he linked [above](http://stackoverflow.com/questions/23318243/how-to-make-beautiful-borderless-geographic-thematic-heatmaps-with-weighted-sur) – Cotton.Rockwood Oct 08 '14 at 21:43
1

lot's of interesting comments and leads above.

I took a look at the Harvard dialect survey to get a sense for what you are trying to do first. I must say really cool maps. And before I start in on what I came up with...I've looked at your work on survey analysis before and have learned quite a few tricks. Thanks.

So my first take pretty quickly was that if you wanted to do spatial smoothing by way of kernel density estimation then you need to be thinking in terms of point process models. I'm sure there are other ways, but that's where I went.

So what I do below is grab a very generic US map and convert it into something I can use as a sampling window. Then I create random samples of points within that region, just pretend those are your centroids. After I attach random values to those points and plot it up.

I just wanted to test this conceptually, which is why I didn't go through the extra steps to grab cbsa's and also sorry for not projecting, but I think these are the fundamentals. Oh and the smoothing in the dialect study is being done over the whole country. I think. That is the author is not stratifying his smoothing procedure within polygons....so I just added states at the end.

code:

library(sp)
    library(spatstat)
    library(RColorBrewer)
    library(maps)
    library(maptools)

    # grab us map from R maps package
    usMap <- map("usa")
    usIds <- usMap$names

    # convert to spatial polygons so this can be used as a windo below
    usMapPoly <- map2SpatialPolygons(usMap,IDs=usIds)

    # just select us with no islands
    usMapPoly <- usMapPoly[names(usMapPoly)=="main",]

    # create a random sample of points on which to smooth over within the map
    pts <- spsample(usMapPoly, n=250, type='random')

    # just for a quick check of the map and sampling locations
    plot(usMapPoly)
    points(pts)

    # create values associated with points, be sure to play aroud with
    # these after you get the map it's fun
    vals <-rnorm(250,100,25)
    valWeights <- vals/sum(vals)
    ptsCords <- data.frame(pts@coords)

    # create window for the point pattern object  (ppp) created below
    usWindow <- as.owin(usMapPoly)

    # create spatial point pattern object
    usPPP <- ppp(ptsCords$x,ptsCords$y,marks=vals,window=usWindow)

    # create colour ramp
    col <- colorRampPalette(brewer.pal(9,"Reds"))(20)

    # the plots, here is where the gausian kernal density estimation magic happens
    # if you want a continuous legend on one of the sides get rid of ribbon=FALSE
    # and be sure to play around with sigma
    plot(Smooth(usPPP,sigma=3,weights=valWeights),col=col,main=NA,ribbon=FALSE)
    map("state",add=TRUE,fill=FALSE)

example no weights:

SmoothMap

example with my trivial weights

SmoothMap2

There is obviously a lot of work in between this and your goal of making this type of map reproducible at various levels of spatial aggregation and sample data, but good luck it seems like a cool project.

p.s. initially I did not use any weighting, but I suppose you could provide weights directly to the Smooth function. Two example maps above.

DMT
  • 1,577
  • 10
  • 16
miles2know
  • 737
  • 8
  • 17
  • hi, could you edit this so it uses weights somewhere? thanks! – Anthony Damico Oct 09 '14 at 14:58
  • Sure..take a look now. It's a rather trivial weighting, but I guess you could just weight the data right in the Smooth function. – miles2know Oct 09 '14 at 15:10
  • If you add more color levels to the rampPalette i.e from (...)(20) to (...)(100) the maps look even more continuous. obviously. – miles2know Oct 09 '14 at 15:46
  • alright i spent a lot of time with this- the weights still do not work. when you randomly sample points within the geography, you are randomly sampling from weighted points. i think it might be possible to use `fields:::cover.design` to preserve the weights, but i'm not sure how? the "random" sampling you have now throws off any weights that the user has at the start. is there any way around this? thanks! – Anthony Damico Oct 12 '14 at 17:01
  • that makes sense and I'm sure there is a way around it. I'll give it some thought, but I'f I understand what your trying to do...you don't have random samples of points. you're working with CBSA centroids right? So if you have those a priori they would be the x,y data you pass to ppp(). Then your data would be the marks argument to ppp(). No need for any sampling. I just used it to create a set of arbitrary points within the domain. I didn't think that it might throw off the weighting. at any rate I'll give it some more thought like I said. cheers. – miles2know Oct 12 '14 at 18:50
  • this is a different question, but if you have any ideas, i'd love to hear your advice..thanks! http://stackoverflow.com/questions/26476804/how-can-i-plot-a-smoothed-weighted-spatial-point-pattern-with-ggplot2-instead-o – Anthony Damico Oct 20 '14 at 23:56
  • Hi Anthony. When a ppp object is passed to plot() it looks for a plotting method called plot.im which is defined in the spatstat package. I'm willing to say that ggplot2 doesn't have any method dispatch for ppp objects. So I don't think you could easily use gglot2 to create the plots above. Before you threw this to another question I noticed you started thinking about grids. I'm checking that out now in fact. – miles2know Oct 21 '14 at 00:51
  • just curious... Why do you want to use ggplot for everything? I'ts a great tool for sure. I didn't start out using it for visualization, but it has a really succinct logic and has become quite useful in some of my work flow. I got into it after getting plyr and dplyr under my belt. Check out the magrittr package btw. So at the end of the day...Same question. And if you really need ggplot for everything I'd start thinking about adding plotting methods to ggplot2. The code is all there. You could create a new package to accommodate all of your plotting ideas. – miles2know Oct 21 '14 at 01:10
  • Also.. A wrinkle I noticed in my initial method that relate to your point about sampling above. When I tried to project the maps above The smoothing was destroyed. So that is definitely a valid point. The map still displayed with boundaries, but projecting changed the smoothing. That's why I'm looking to smoothing over a grid. – miles2know Oct 21 '14 at 01:23