iterating, calculating, and combining two lists with multiple tuples

Question

I am working with a projected coordinate dataset that contains x,y,z data (432 line csv with X Y Z headers, not attached). I wish to import this dataset, calculate a new grid based on user input and then start performing some statistics on points that fall within the new grid. I've gotten to the point that I have two lists (raw_lst with 431(x,y,z) and grid_lst with 16(x,y) (calling n,e)) but when I try to iterate through to start calculating average and density for the new grid it all falls apart. I am trying to output a final list that contains the grid_lst x and y values along with the calculated average z and density values.

I searched numpy and scipy libraries thinking that they may have already had something to do what I am wanting but was unable to find anything. Let me know if any of you all have any thoughts.

sample_xyz_reddot_is_newgrid_pictoral_representation

import pandas as pd
import math

df=pd.read_csv("Sample_xyz.csv")

N=df["X"]
E=df["Y"]
Z=df["Z"]

#grid = int(input("Specify grid value "))
grid = float(0.5) #for quick testing the grid value is set to 0.5 

#max and total calculate the input area extents
max_N = math.ceil(max(N))
max_E = math.ceil(max(E))
min_E = math.floor(min(E))
min_N = math.floor(min(N))

total_N = max_N - min_N
total_E = max_E - min_E
total_N = int(total_N/grid)
total_E = int(total_E/grid)

#N_lst and E_lst calculate the mid points based on the input file extents and the specified grid file
N_lst = [] 
n=float(max_N)-(0.5*grid)
for x in range(total_N):
    N_lst.append(n)
    n=n-grid

E_lst = []
e=float(max_E)-(0.5*grid)
for x in range(total_E):
    E_lst.append(e)
    e=e-grid

grid_lst = []
for n in N_lst:
    for e in E_lst:
        grid_lst.append((n,e))

#converts the imported dataframe to list
raw_lst = df.to_records(index=False)
raw_lst = list(raw_lst)

#print(grid_lst) # grid_lst is a list of 16 (n,e) tuples for the new grid coordinates.
#print(raw_lst) # raw_lst is a list of 441 (n,e,z) tuples from the imported file - calling these x,y,z.

#The calculation where it all falls apart.
t=[]
average_lst = []
for n, e in grid_lst:
    for x, y, z in raw_lst:
        if n >= x-(grid/2) and n <= x+(grid/2) and e >= y-(grid/2) and e <= y+(grid/2):
            t.append(z)
            average = sum(t)/len(t)
            density = len(t)/grid
            average_lst = (n,e,average,density)
            print(average_lst)
            # print("The length of this list is " + str(len(average_lst)))
            # print("The length of t is " + str(len(t)))

SAMPLE CODE FOR RUNNING

import random

grid=5
raw_lst = [(random.randrange(0,10), random.randrange(0,10), random.randrange(0,2))for i in range(100)]
grid_lst = [(2.5,2.5),(2.5,7.5),(7.5,2.5),(7.5,7.5)]

t=[]
average_lst = []
for n, e in grid_lst:
    for x, y, z in raw_lst:
        if n >= x-(grid/2) and n <= x+(grid/2) and e >= y-(grid/2) and e <= y+(grid/2):
            t.append(z)
            average = sum(t)/len(t)
            density = len(t)/grid
            average_lst = (n,e,average,density)
            print(average_lst)

instead of showing us all this code, please try to boil down your code to a small reproducible example with sample data — gold_cy, Jun 17 '20 at 20:13
so if i understood correctly, you want a numpy function to do the work you showed us? — Dorian, Jun 17 '20 at 21:03
Numpy, Scipy, long-hand... I am looking for some help to get over the hump. Let me know if I should reword the question, I thoroughly confuse myself sometimes too. — DMo, Jun 17 '20 at 21:17

score 0 · Answer 1 · answered Jun 17 '20 at 23:15

Some advices

when working with arrays, use numpy. It has more functionalities
when working with grids it's often more handy the use x-coords, y-coords as single arrays

Comments to the solution

obviousley you have a grid, or rather a box, grd_lst. We generate it as a numpy meshgrid (gx,gy)
you have a number of points raw_list. We generate each elemnt of it as 1-dimensional numpy arrays
you want to select the r_points that are in the g_box. We use the percentage formula for that: tx = (rx-gxMin)/(gxMax-gxMin)
if tx, ty are within [0..1] we store the index
as an intermediate result we get all indices of raw_list that are within the g_box
with that index you can extract the elements of raw_list that are within the g_box and can do some statistics
note that I have omitted the z-coord. You will have to improve this solution.

--

import numpy as np
from matplotlib import pyplot as plt
import matplotlib.colors as mclr
from matplotlib import cm

f10 = 'C://gcg//picStack_10.jpg'      # output file name
f20 = 'C://gcg//picStack_20.jpg'      # output file name

def plot_grid(gx,gy,rx,ry,Rx,Ry,fOut):
    fig = plt.figure(figsize=(5,5)) 
    ax = fig.add_subplot(111)
    myCmap = mclr.ListedColormap(['blue','lightgreen'])
    ax.pcolormesh(gx, gy, gx, edgecolors='b', cmap=myCmap, lw=1, alpha=0.3)
    ax.scatter(rx,ry,s=150,c='r', alpha=0.7)
    ax.scatter(Rx,Ry,marker='s', s=150,c='gold', alpha=0.5)
    ax.set_aspect('equal')
    plt.savefig(fOut)
    plt.show()


def get_g_grid(nx,ny):
    ix = 2.5 + 5*np.linspace(0,1,nx)
    iy = 2.5 + 5*np.linspace(0,1,ny)
    gx, gy = np.meshgrid(ix, iy,  indexing='ij')
    return gx,gy

def get_raw_points(N):
    rx,ry,rz,rv = np.random.randint(0,10,N), np.random.randint(0,10,N), np.random.randint(0,2,N), np.random.uniform(low=0.0, high=1.0, size=N)
    return rx,ry,rz,rv


N = 100   
nx, ny = 2, 2
gx,gy = get_base_grid(nx,ny)
rx,ry,rz,rv = get_raw_points(N)
plot_grid(gx,gy,rx,ry,0,0,f10)

def get_the_points_inside(gx,gy,rx,ry):
    #----- run throuh the g-grid -------------------------------
    nx,ny = gx.shape
    N = len(rx)
    index = []
    for jx in range(0,nx-1):
        for jy in range(0,ny-1):
             #--- run through the r_points
            for jr in range(N):
                test_x = (rx[jr]-gx[jx,jy]) / (gx[jx+1,jy] - gx[jx,jy])  
                test_y = (ry[jr]-gy[jx,jy]) / (gy[jx,jy+1] - gy[jx,jy])  
                if (0.0 <= test_x <= 1.0) and (0.0 <= test_y <= 1.0):
                    index.append(jr)
    return index


index = get_the_points_inside(gx,gy,rx,ry)
Rx, Ry, Rz, Rv = rx[index],  ry[index],  rz[index],  rv[index]

plot_grid(gx,gy,rx,ry,Rx,Ry,f20)

I appreciate the detail and advice. After researching np.meshgrid it, along with indexing, may do what I desire. I will still need to iterate the g-grid in addition (it is not a single box but rather the center points of the four quadrants, so there are four boxes in this example). — DMo, Jun 18 '20 at 16:35
set `nx, ny = 3, 3` for 4 boxes. The code does already iterate over the g-grid. You have to write a function for the statistical evaluation which is called in each box of g-grid. — pyano, Jun 18 '20 at 19:27

iterating, calculating, and combining two lists with multiple tuples

1 Answers1