I have a number of CSV files with x, y, and z coordinates. These coordinates are not long/lat, but rather a distance from an origin. So within the CSV, there is a 0,0 origin, and all other x, y locations are a distance from that origin point in meters.
The x, and y values will be both negative and positive float values. The largest file I have is ~1.4 million data points, the smallest is ~20k.
The files represent an irregular shaped map of sorts. The distance values will not produce a uniform shape such as a rectangle, circle, etc. I need to generate a bounding box that fits the most area within the values that are contained within the csv files.
logically, here are the steps I want to take.
- Read the points from the file
- Get the minimum and maximum x coordinates
- get the minimum and maximum y coordinates.
- Use min/max coordinates to get a bounding rectangle with (xmin,ymin), (xmax,ymin), (xmin,ymax) and (xmax,ymax) that will contain the entirety of the values of the CSV file.
- Create a grid across that rectangle with a 1 m resolution. Set that grid as a boolean array for the occupancy.
- Round the map coordinates to the nearest integer.
- For every rounded map coordinate switch the occupancy to True.
- Use a morphological filter to erode the edges of the occupancy map.
- Now when a point is selected check the nearest integer value and whether it falls within the occupancy map.
I'm facing multiple issues, but thus far my biggest issue is memory resources. for some reason this script keeps dying with a SIGKILL, or at least I think that is what is occuring.
class GridBuilder:
"""_"""
def __init__(self, filename, search_radius) -> None:
"""..."""
self.filename = filename
self.search_radius = search_radius
self.load_map()
self.process_points()
def load_map(self):
"""..."""
data = np.loadtxt(self.filename, delimiter=",")
self.x_coord = data[:, 0]
self.y_coord = data[:, 1]
self.z_coord = data[:, 2]
def process_points(self):
"""..."""
min_x = math.floor(np.min(self.x_coord))
min_y = math.floor(np.min(self.y_coord))
max_x = math.floor(np.max(self.x_coord))
max_y = math.floor(np.max(self.y_coord))
int_x_coord = np.floor(self.x_coord).astype(np.int32)
int_y_coord = np.floor(self.y_coord).astype(np.int32)
x = np.arange(min_x, max_x, 1)
y = np.arange(min_y, max_y, 1)
xx, yy = np.meshgrid(x, y, copy=False)
if __name__ == "__main__":
"""..."""
MAP_FILE_DIR = r"/sample_data"
FILE = "testfile.csv"
fname = os.path.join(MAP_FILE_DIR, FILE)
builder = GridBuilder(fname, 500)
my plan was to take the grid with the coordinates and update each location with a dataclass.
@dataclass
class LocationData:
"""..."""
coord: list
occupied: bool
This identifies the grid location, and if its found within the CSV file map.
I understand this is going to be time consuming process, but I figured this would be my first attempt.
I know Stackoverflow generally dislikes attachements, but I figured it might be useful for a sample dataset of what I'm working with. So I've uploaded a file for sharing. test_file
UPDATE: the original code utilized itertools to generate a grid for each location. I ended up switching away from itertools, and utilized numpy meshgrid() instead. This caused the same issue, but meshgrid() has a copy parameter that can be set to False to preserver memory resources. this fixed the memory issue.