0

So i'm trying to analyze a square and cluster all of the points of certain groups together. I'm thinking that pdfCluster is the best way to go since I need to measure the density of the points through a kernel density estimator to get the correct clusters and then I need to actually group them together to create a plot (I have the long/lat of the points). I'm really stuck on this and any help would be greatly appreciated.

I’m running across an issue with my code while trying to use a Kernel Density Estimator to cluster points. I am working with my data in two different ways trying to find the most optimal. First, I have my data in the form of a matrix. An example of this is below, and I have my latitude and longitude in my code attached to the columns and rows in the matrix.

m <- c(
  c(8.83,8.89,8.81,8.87,8.9,8.87),
  c(8.89,8.94,8.85,8.94,8.96,8.92),
  c(8.84,8.9,8.82,8.92,8.93,8.91),
  c(8.79,8.85,8.79,8.9,8.94,8.92),
  c(8.79,8.88,8.81,8.9,8.95,8.92),
  c(8.8,8.82,8.78,8.91,8.94,8.92),
  c(8.75,8.78,8.77,8.91,8.95,8.92),
  c(8.8,8.8,8.77,8.91,8.95,8.94),
  c(8.74,8.81,8.76,8.93,8.98,8.99),
  c(8.89,8.99,8.92,9.1,9.13,9.11),
  c(8.97,8.97,8.91,9.09,9.11,9.11),
  c(9.04,9.08,9.05,9.25,9.28,9.27),
  c(9,9.01,9,9.2,9.23,9.2),
  c(8.99,8.99,8.98,9.18,9.2,9.19),
  c(8.93,8.97,8.97,9.18,9.2,9.18)
)
dim(m) <- c(15,6)

I also have my data in a data table where column 1 is my latitude, column 2 is my longitude, and column 3 is the value.

z <- c(
  c(8.83,8.89, 2),
  c(8.89,8.94, 4),
  c(8.84,8.9, 1),
  c(8.79,8.852, 4),
  c(8.79,8.88, 5),
  c(8.8,8.82, 2),
  c(8.75,8.78, 1),
  c(8.8,8.8, 2),
  c(8.74,8.81, 7),
  c(8.89,8.99, 1),
  c(8.97,8.97, 6),
  c(9.04,9.08, 8),
  c(9,9.01, 1),
  c(8.99,8.99, 8),
  c(8.93,8.97, 2)
)
dim(z) <- c(15,3)

The actual data I am using is from larger rasters and shapefiles. The raster is from http://beta.sedac.ciesin.columbia.edu/data/set/gpw-v4-population-count/data-download. And the shapefiles are from http://www.gadm.org/download — I am using Nigeria.

The main question of this post is clustering and the optimal data format for clustering functions. I currently have all of the grid points of the entire country with their (Lat, Long, Value). I want to run a Kernel Density Estimator across all of the points and then cluster based on certain values. Looking at the pdfCluster package it seems to do just that except i’m not sure how to allow it to accept (lat/long) values and run across a geographic plane. Since my data is across a geographic area and isn’t completely continuous i’m running in to errors. Any hints for how to modify the pdfCluster package for accepting such values or what dataset is best to use would be greatly appreciated.

  • Welcome to Stack Overflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – zx8754 Jun 13 '16 at 20:05
  • @zx8754 I was wondering if there was a more general answer to this question but I added more context above – gamemastersr Jun 13 '16 at 20:08

0 Answers0