1

Trying to find the way to perform a set of processing steps to list of arrays through a loop

Assume we have only one array. We perform the following iterative steps , so we can then run diagnostics for each step if needed. The process of each step is irrelevant, but the output is saved in a new array as an intermediate (step1, step2, step3...etc).

a=(rand(10000,4))

#This is the process that i want to repeat for all the arrays:
a_step1=a[a[:,0]<0.1]                 #This is the first step
a_step2=a_step1[a_step1[:,1]>0.1]     #second
a_step3=a_step2[a_step2[:,2]<0.5]     #third
....                                  #etc

Like this in the end i have a, a_step1, a_step2, a_step3 as arrays that i can perform checks on.

So i would like to do this over a series of arrays (a,b,c,d...)

i have tried this

a=(rand(10000,4))
b=(rand(10000,4))
c=(rand(10000,4))
d=(rand(10000,4))
for i in (a,b,c,d):
    (i)_step1=(i)[(i)[:,0]<0.1]
    (i)_step2=(i)_step1[(i)_step1[:,1]>0.1]
    (i)_step3=(i)_step2[(i)_step2[:,2]<0.5]
    .....

and got:

File "<ipython-input-732-7917e62bc9c0>", line 6
    (i)_step1=(i)[(i)[:,0]<0.1]
            ^
SyntaxError: invalid syntax

So given the list of all arrays (a,b,c,d...) i would like in the end to have the following produced as arrays to run checks on:

a_step1, b_step1, c_step1, d_step1
a_step2, b_step2, c_step2, d_step2
a_step3, b_step3, c_step3, d_step3
.......

It seems so simple but i can't wrap my head around it.....

Red Sparrow
  • 387
  • 1
  • 5
  • 17
  • can you please clarify if the question is related to Numpy o plain python? Thank you – Pynchia Jan 29 '16 at 16:05
  • It seems that `d` is simply not defined in your code. Could you show the declaration of your arrays? – Kirell Jan 29 '16 at 16:06
  • If `for i in (a,b,c,d...):` is the first time that the script sees the variable d, python's not going to know what it is. If you have a line like `a, b, c, d = (rand(10000, 4) for i in range(4)), you'll actually have arrays to work with. – Elliot Jan 29 '16 at 16:10
  • i guess the question is for plain python. @Kikohs, CeramicSheep you are right. But even if i define the arrays beforehand, my output is not what i would expect – Red Sparrow Jan 29 '16 at 16:16
  • you are overwriting your value of i (which should represent a, b, c, d in the for loop by writing i = (rand(10000, 4)). It would help us if you explained what the expected output was. It might be worth taking a step back from the solution you're attempting and give us the problem you're attempting to solve for context. – A Small Shell Script Jan 29 '16 at 16:33
  • @ASmallShellScript. Have tried to clarify what i am trying to do a bit. Hope its coming through now – Red Sparrow Jan 29 '16 at 16:42
  • @AlexD. what do you expect the syntax a[a[:,0]<0.1] to do? It sounds like you should look into generators, which are functions that can return results and continue from that line later. You could look into filter(), zip() and list comprehensions which could help you avoid duplicated logic, too. If you give more clarification on the requirement I'll try to help with an example :) – A Small Shell Script Jan 29 '16 at 18:51
  • @ASmallShellScript thanks for your persistence :-) I expect the `a[a[:,0]<0.1]` to make a new array keeping only the entries of a that have below 0.1 in the first column. Basically my aim is to do the same steps for a series of arrays without having to repeat the code. But in the end i also want to still have all the steps for all the arrays saved so i can check them. I looked a bit into generators and your example below. Still not fully digested it, but i am working on it.... – Red Sparrow Feb 01 '16 at 12:37

1 Answers1

1

EDITED

The processing you are trying to do is as follows. I use a list comprehension, testing the first element vs. a value and returning the whole row if the test resolves true. You could also do this with filter() or a custom function if you do not like this syntax, or if the needs of the check grow too large for a one-liner.

val = 0.1
first_element_lessthan_pt1 = [row for row in a if row[0]<val]

I have also incorperated all four "steps" into a single namedtuple, rather than having one-per-value. If you still want them named step1/step2/step3/step4 you could instead declare StepResult as below, and edit the print statements accordingly.

StepResult = namedtuple("StepResult", "step1, step2, step3, step4")

You could use another namedtuple to give dot-note access to each test on each object if you wanted, just follow the same concept as I did with steps.

from random import random
from collections import namedtuple


# an object type to help organize our steps so we can follow code easier
# and refer to each step/test in dot-notation which can be helpful!
StepResult = namedtuple("StepResult", "lt1_list, gt1_list, lt5_list, gt5_list")

def rand(row, col):
    # creates a jagged array/list of floats in the shape of (row, col)
    # this is a mock of a numpy function which OP was using
    return [[random() for _c in xrange(0, col)] for _r in xrange(0, row)]


def run_steps(tup):
    # tup is an iterable representing a jagged matrix/array. ex: list of lists
    # yields four new lists:
    # nested lists where the first value are less than 0.1
    # nested lists where the first value are greater than 0.1
    # nested lists where the first value are less than 0.5
    # nested lists where the first value are greater than 0.5

    for val in (0.1, 0.5):
        yield [row for row in tup if row[0]<val]
        yield [row for row in tup if row[0]>val]


def steps(tup):
    # tup is an iterable representing a jagged matrix/array
    # returns a StepResult object
    # .. note, the * unpacks the results of run_steps as individual parameters
    # when calling the initalize of StepResult.
    return StepResult(*run_steps(tup=tup))


if __name__ == '__main__':
    """ A program which creates 4 jagged arrays with random float values, then
        iterates through the lists to provide a sort of report on which elements
        in each list are greater-than or less-than certain values.
    """
    a=(rand(4, 3))
    b=(rand(4, 3))
    c=(rand(4, 3))
    d=(rand(4, 3))

    tuples = (a, b, c, d)
    for tup in tuples:
        print "----original----"
        print tup
        tup_steps = steps(tup)
        print "----lt1----"
        print tup_steps.lt1_list
        print "----gt1----"
        print tup_steps.gt1_list
        print "----lt5----"
        print tup_steps.lt5_list
        print "----gt5----"
        print tup_steps.gt5_list
  • I tried running the code but i think it's not quite the question i had. The idea is that an new name with the string "_step1" should be added to the original array name (a,b, ...) once the first step process is finished. So i think i need to extract the array name as a string and then generate a new name based on that string + a new string that describes each step. If this is done sequentially it doesn't need to include generators as far as i can understand. So maybe the real question is how to extract the name of the array as a string. Is this more clear? – Red Sparrow Feb 01 '16 at 14:01
  • @AlexD., you can get the string representation of a variable with locals() or globals() (http://stackoverflow.com/questions/2553354/how-to-get-a-variable-name-as-a-string-in-python), and from there could eval() new variables which are based on the ones you're iterating. This is really hacky and not advisable. Use a custom object, dict, list or named tuple as above to represent this association, rather than trying to build that logic into a variable naming convention. ex. a.step1 could be a property on a custom object or named tuple, rather than a_step1 being an unassociated local var. – A Small Shell Script Feb 01 '16 at 16:01
  • more complicated than i thought indeed but this way it's quite advanced – Red Sparrow Sep 01 '16 at 12:31