2

Assume that I am using a client computer that will not allow for pip or pandas and must use Python 2.7. Binaries and converting into an exe is not allowed. I am reading in a CSV with column headers as names with numpy.

In my dataset, I am attempting to generate a list of orders that apply to unique combinations of Facility, Destination, Program #, and other factors, where p is the read csv dataset.

CSV starting out

What it would look like if I did this in Excel. The values in the Order field are what I'd like to have as a list in a variable called my_orders.

My current code looks like:

progs = np.unique(p['Program'])
facil = np.unique(p['Facility'])
dest = np.unique(p['Destination']) 
reqs = np.unique(p['Requested'])
prods = np.unique(p['Produced'])
tier1 = np.unique(p['Tier1'])
tier2 = np.unique(p['Tier2'])

Which is followed by the following method originally written with pandas and Python3 in mind until discovering that only 2.7 and numpy were available:

for a in range(len(progs)):
    print("on Program ",a)
    ProgChild = {"name":progs[a], 'children':[]}
    for r in range(len(reqs)):
        reqChild = {"name":reqs[r], 'children':[]}
        for s in range(len(prods)):
            prodChild = {'name':prods[s], "children":[]}
            for g in range(len(progs)):
                programChild = {'name':progs[g], "children":[]}
                for i in range(len(facil)):
                    FacilChild={"name":facil[i], "children":[]}
                    for c in range(len(tier1)):
                        Tier1Child={"name":tier1[c], "children":[]}
                        for d in range(len(tier2)):
                            # here's where I'm in trouble:
                            Order_Cond = np.array[[progs[a]& reqs[r]&
                                prods[s]&progs[g]& facil[i]& tier1[c]]
                            my_orders = np.where(p['Orders'], Order_cond)
                            print my_orders
                            # do other things

As you can see the original intent was to use the For loops to set up a routine that only returned a list of Orders that came from unique combinations of facility, destination, program, etc. The Order_cond variable obviously has the wrong syntax.

If this were in SQL I'd just say "Select Orders From My_Data Where progs=a & reqs=r;" and so on.

I also considered a list comprehension but it doesn't work yet either:

list(x for p['Orders'] in p if p['Orders'] in Order_cond)

Again, the goal is for a list of Orders to be created and stored in my_orders, which I then use for other functions.

martineau
  • 119,623
  • 25
  • 170
  • 301
DCUFan7
  • 385
  • 1
  • 3
  • 10
  • Could you give an actual CSV example instead of just a png? – Krupip Mar 30 '17 at 16:04
  • For this kind of access it might be simpler to load this data in an SQL It shouldn't be hard to iterate through the rows (elements) of the array and write them to the database. In http://stackoverflow.com/a/35184195/901925 I illustrate loading a structured array from a sql. The other direction should be as easy. Or here: http://stackoverflow.com/questions/32770340/block-insert-from-array-to-sqlite-in-python – hpaulj Mar 30 '17 at 16:50
  • As an aside, that isn't a list comprehension, that is a *generator expression* that you materialize into a list. You *should* just use a list-comprehension, though. – juanpa.arrivillaga Mar 30 '17 at 17:18

2 Answers2

0

I haven't followed your whole nested if code, but here's a way of identifying records in an structured array that meet certain criteria

Make a sample array with 3 fields:

In [447]: dt = np.dtype('S10,S10,int')
In [448]: data = np.ones((10,),dt)
In [451]: data['f0']='one two three four five six seven eight nine ten'.split()
In [452]: data['f1']='a b c a b c a b c a'.split()
In [453]: data['f2']=np.arange(10)
In [454]: data
Out[454]: 
array([(b'one', b'a', 0), (b'two', b'b', 1), (b'three', b'c', 2),
       (b'four', b'a', 3), (b'five', b'b', 4), (b'six', b'c', 5),
       (b'seven', b'a', 6), (b'eight', b'b', 7), (b'nine', b'c', 8),
       (b'ten', b'a', 9)], 
      dtype=[('f0', 'S10'), ('f1', 'S10'), ('f2', '<i4')])

Test one field:

In [461]: cond = data['f0']==b'three'
In [462]: cond
Out[462]: array([False, False,  True, False, False, False, False, False, False, False], dtype=bool)

Another field with several matches:

In [463]: cond = data['f1']==b'c'
In [464]: cond
Out[464]: array([False, False,  True, False, False,  True, False, False,  True, False], dtype=bool)

Combining several field tests:

In [465]: cond = (data['f1']==b'c') & (data['f0']==b'three') & (data['f2']==2)
In [466]: cond
Out[466]: array([False, False,  True, False, False, False, False, False, False, False], dtype=bool)
In [467]: data[cond]
Out[467]: 
array([(b'three', b'c', 2)], 
      dtype=[('f0', 'S10'), ('f1', 'S10'), ('f2', '<i4')])

Or going the sql route:

In [468]: import sqlite3
In [469]: conn=sqlite3.connect(':memory:')
In [470]: cur = conn.cursor()
In [471]: cur.execute('''CREATE TABLE array (f0 str, f1 str, f2 int)''')
Out[471]: <sqlite3.Cursor at 0xaa0a38a0>
In [472]: cur.executemany("INSERT INTO array VALUES (?,?,?)",data)

An element of data looks just like a tuple; so the insert is trivial.

Out[472]: <sqlite3.Cursor at 0xaa0a38a0>
In [473]: cur.execute('select * from array')
Out[473]: <sqlite3.Cursor at 0xaa0a38a0>
In [474]: cur.fetchall()
Out[474]: 
[(b'one', b'a', b'\x00\x00\x00\x00'),
 (b'two', b'b', b'\x01\x00\x00\x00'),
 (b'three', b'c', b'\x02\x00\x00\x00'),
 ...
 (b'ten', b'a', b'\t\x00\x00\x00')]
In [486]: cur.execute('select * from array where f0==(?)',(b"four",))
Out[486]: <sqlite3.Cursor at 0xaa0a38a0>
In [487]: cur.fetchall()
    Out[487]: [(b'four', b'a', b'\x03\x00\x00\x00')]
In [488]: cur.execute('select * from array where f0==(?) or f1==(?)',(b"four",b"c"))
Out[488]: <sqlite3.Cursor at 0xaa0a38a0>
In [489]: cur.fetchall()
Out[489]: 
[(b'three', b'c', b'\x02\x00\x00\x00'),
 (b'four', b'a', b'\x03\x00\x00\x00'),
 (b'six', b'c', b'\x05\x00\x00\x00'),
 (b'nine', b'c', b'\x08\x00\x00\x00')]

This example needs to be cleaner, but it gives the rough idea of what's possible.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
0

So you want to select from a csv file, rows which correspond with certain values for certain columns? In python 2.7 no less? Ok, this is not that difficult for 2.7

You'll find this easiest with csv.DictReader, which reads in your csv files as rows of dictionaries, where the keys correspond to your column labels for each row. Then, given a dictionary of key and value pairs you want to select for, the below code returns an array of dictionaries that fit what you want. If you would like to strip out the key value pairing and just have arrays of values this is trivially implemented. Since you can convert your csv file into a list of dictionaries, it becomes a matter of selecting a dictionary from an array with certain values for keys

import csv

def dict_has_key_values(dict, dict_key_values):
    for key in dict_key_values:
        if not(key in dict and dict[key] == dict_key_values[key]):
           return False
    return True

def select_from_dict(csv_dict_reader, dict_key_values):
    list_of_dict = []
    for row_dict in csv_dict_reader:
        if dict_has_key_values(dict, dict_key_values):
             list_of_dict.append(dict)
    return list_of_dict


with open('yourfile.csv') as csvfile:
    selected_key_values = {...}
    reader = csv.DictReader(csvfile)
    selected_rows = select_from_dict(reader, selected_key_values)

selected_rows should now have every row that

A: at least has every key in selected_key_values, and

B: the values associated with those keys are the same.

Community
  • 1
  • 1
Krupip
  • 4,404
  • 2
  • 32
  • 54