I'm working on a project to calculate the centroid of a state/country using python.
What I have done so far:
Take an outline of the state and run it through ImageJ to create a csv of the x,y coordinates of the border. This gives me a .csv file with data like this:
556,243
557,243
557,250
556,250
556,252
555,252
555,253
554,253
etc, etc,
For about 2500 data points.
Import this list into a Python script.
Calculate the average of the x and y coordinate arrays. This point is the centroid. (Idea similar to this)
Plot the points and the centroid using matplotlib.
Here is my code:
#####################################################
# Imports #
#####################################################
import csv
import matplotlib.pyplot as plt
import numpy as np
import pylab
#####################################################
# Setup #
#####################################################
#Set empty list for coordinates
x,y =[],[]
#Importing csv data
with open("russiadata.csv", "r") as russiadataFile:
russiadataReader = csv.reader(russiadataFile)
#Create list of points
russiadatalist = []
#Import data
for row in russiadataReader:
#While the rows have data, AKA length not equal to zero.
if len(row) != 0:
#Append data to arrays created above
x.append(float(row[0]))
y.append(float(row[1]))
#Close file as importing is done
russiadataFile.closejust flipped around the
#####################################################
# Data Analysis #
#####################################################
#Convert list to array for computations
x=np.array(x)
y=np.array(y)
#Calculate number of data points
x_len=len(x)just flipped around the
y_len=len(y)
#Set sum of points equal to x_sum and y_sum
x_sum=np.sum(x)
y_sum=np.sum(y)
#Calculate centroid of points
x_centroid=x_sum/x_len
y_centroid=y_sum/y_len
#####################################################
# Plotting #
#####################################################
#Plot all points in data
plt.xkcd()
plt.plot(x,y, "-.")
#Plot centroid and label it
plt.plot(x_centroid,y_centroid,'^')
plt.ymax=max(x)
#Add axis labels
plt.xlabel("X")
plt.ylabel("Y")
plt.title("russia")
#Show the plot
plt.show()
The problem I have run into is that some sides of the state have more points than others, so the centroid is being weighted towards areas with more points. This is not what I want. I'm trying to find the centroid of the polygon that has vertices from the x,y coordinates.
This is what my plot looks like:
https://i.stack.imgur.com/jiPRz.jpg
As you can see, the centroid is weighted more towards the section of points with more density. (As a side note, yes, that is Russia. I'm having issues with the plot coming out backwards and stretched/squashed.)
In other words, is there a more accurate way to get the centroid?
Thanks in advance for any help.