0

I'm working on a project to calculate the centroid of a state/country using python.

What I have done so far:

  1. Take an outline of the state and run it through ImageJ to create a csv of the x,y coordinates of the border. This gives me a .csv file with data like this:

    556,243

    557,243

    557,250

    556,250

    556,252

    555,252

    555,253

    554,253

    etc, etc,

For about 2500 data points.

  1. Import this list into a Python script.

  2. Calculate the average of the x and y coordinate arrays. This point is the centroid. (Idea similar to this)

  3. Plot the points and the centroid using matplotlib.

Here is my code:

#####################################################
#                     Imports                       #
#####################################################
import csv
import matplotlib.pyplot as plt
import numpy as np
import pylab


#####################################################
#                       Setup                       #
#####################################################

#Set empty list for coordinates
x,y =[],[]

#Importing csv data 
with open("russiadata.csv", "r") as russiadataFile:
    russiadataReader = csv.reader(russiadataFile)

    #Create list of points
    russiadatalist = []

    #Import data
    for row in russiadataReader:
        #While the rows have data, AKA length not equal to zero. 
        if len(row) != 0: 
            #Append data to arrays created above
            x.append(float(row[0]))
            y.append(float(row[1]))

#Close file as importing is done
russiadataFile.closejust flipped around the 




#####################################################
#                  Data Analysis                    #
#####################################################

#Convert list to array for computations
x=np.array(x)
y=np.array(y)


#Calculate number of data points
x_len=len(x)just flipped around the 
y_len=len(y)

#Set sum of points equal to x_sum and y_sum
x_sum=np.sum(x)
y_sum=np.sum(y)

#Calculate centroid of points
x_centroid=x_sum/x_len
y_centroid=y_sum/y_len



#####################################################
#                     Plotting                      #
#####################################################


#Plot all points in data
plt.xkcd()
plt.plot(x,y, "-.")

#Plot centroid and label it
plt.plot(x_centroid,y_centroid,'^')


plt.ymax=max(x)
#Add axis labels
plt.xlabel("X")
plt.ylabel("Y")
plt.title("russia")

#Show the plot
plt.show()

The problem I have run into is that some sides of the state have more points than others, so the centroid is being weighted towards areas with more points. This is not what I want. I'm trying to find the centroid of the polygon that has vertices from the x,y coordinates.

This is what my plot looks like:

https://i.stack.imgur.com/jiPRz.jpg

As you can see, the centroid is weighted more towards the section of points with more density. (As a side note, yes, that is Russia. I'm having issues with the plot coming out backwards and stretched/squashed.)

In other words, is there a more accurate way to get the centroid?

Thanks in advance for any help.

anothermh
  • 9,815
  • 3
  • 33
  • 52
blackbrandt
  • 2,010
  • 1
  • 15
  • 32
  • 1
    sounds like you want to compute the centroid of the convex hull. I think scipy can get you the convex hull. Then shapely can get the you centroid. – Paul H Oct 01 '17 at 18:52
  • I don't think convex hull is even needed, with shapely it should be as straightforward as Polygon().centroid – lomereiter Oct 02 '17 at 05:50

2 Answers2

0

You can find the correct formula for a closed polygon on Wikipedia: https://en.wikipedia.org/wiki/Centroid#Centroid_of_a_polygon

Another formula is helpful to deal with Kaliningrad oblast (exclave) and islands (if you want to be really precise): https://en.wikipedia.org/wiki/Centroid#By_geometric_decomposition

That said, such questions probably fit better to https://math.stackexchange.com

lomereiter
  • 356
  • 1
  • 5
0

It sounds to me like you don't want your centroid to be calculated with the density of the scatter in mind.

If you just want to use surface area, then I would eliminate any point that is contained within the current outline of the scatter. A slightly more accurate way might be to pretend there is a box outlined by your outer-most points, then to check the x- and y-coordinates of all of your points and eliminate any that fall inside of the box. Any points that fall inside the current outline are not contributing to the shape, only the density.

I think the most technical and accurate approach would be very complicated, and here's what I think it would require: to get the outer-most points to connect based on least distance from each other, and furthest distance from all other points. By "connect" I mean to pretend that a line passes through, and ends at, both points. It should be defined mathematically. Then, for each point, calculate whether or not it falls inside or outside of this outline, and eliminate all that fall inside (they are redundant as they are already inside the shape).

Colin Hancey
  • 219
  • 2
  • 8
  • 1
    Starting from scratch, yeah it's complicated. But using existing libraries to compute the centroid of the convex hull would be quite simple. – Paul H Oct 01 '17 at 18:51
  • @Colin Hancey, that is a much more logical way of approaching the problem. Are there any packages I could look at in python 2.7 that would be able to help me with that/ – blackbrandt Oct 01 '17 at 22:01
  • 1
    I ended up rewriting my code to use a Monte Carlo to figure out all points inside the shape. – blackbrandt Nov 01 '17 at 02:55