Numpy Arrays: Searching for subarrays

Question

Basically, I have a bunch of numpy arrays each with a list of websites in a larger array. I wanted to, with input from the user, basically return the arrays where the input of the user is the first element of the array. It would return that, the user would input another website, and it would be the second element of the arrays that matched the first time. So for example:

bigarray = [['website1','website2', 'website3', 'website4'],
['website1', 'website7', 'website9', 'website3'],
['website1','website2','website5', 'website9','website24','website36']]

basically if someone were to input 'website1' it would return

{'website2':2, 'website7':1}

after if they were to input website 2 it would output

{'website3':1,"website5":1}

and so on. I hope I was clear, if not, please comment and I'll make it more clear. I don't know how to make this efficient and quick, I've been brainstorming but I can only come up with inefficient methods. Please help,

This is what I have so far, but it doesn't do a dictionary with frequencies. I can't figure out how to get frequencies in the dictionary, nor can I figure out how to get the second third fourth etc. elements searching. This only works for the first element.

import numpy as np
import cherrypy as cp

def initialize(self):
    pagearray = np.load("pagearray.npy")

def submit(self, input):
    for i in pagearray:
        if input==i[0]:
            subpagearray += [i[1:]]
            possibilities +=i[0]
    return possibilities

Thanks, F

Oh, didn't know you had to post your attempts, I'll update it soon. — furby559, Jul 28 '15 at 17:55

Obj3ctiv3_C_88 · Answer 1 · 2015-07-28T19:12:23.240

def build_dict(a_list=None):
    if a_list is None:
        a_list = []
    site_dict = {}
    for site in a_list:
        try: 
            site_dict[site] = site_dict[site] + 1
        except KeyError:
            site_dict[site] = 1
    return site_dict

This is how you make a dictionary but I'm not sure what you're going for so you can use this as a template

I figured out what you're going for, I think. Let me know if this is it:

def advanced_dict(a_list=None):
    if a_list is None:
        a_list = []

    index_holder = 0  # Holds the primary dict value
    site_dict = {}  # contains a dict of dicts
    for sub_arr in big_array:
        for site in sub_arr:
            try:
                site_dict['website'+str(index_holder)]
            except KeyError:
                site_dict['website'+str(index_holder)] = {} # if no dict create dict
            try:
                site_dict['website'+str(index_holder)][site] += 1
            except KeyError:
                site_dict['website'+str(index_holder)][site] = 1
            index_holder += 1
        index_holder = 0
    return site_dict

The try part doesn't work... there's a syntax error in the try statement. — furby559, Jul 28 '15 at 18:18

irios · Answer 2 · 2015-07-28T18:31:14.043

0

You could use a data structure that solves better your problem. Here you can find some options in Python. Try to avoid premature optimization, and keep your code simpler as you can.

edited Jul 28 '15 at 18:31

answered Jul 28 '15 at 18:18

irios

165
1
11

I'm working with a huge dataset so the premature optimization is a must to chug through all of it :/ – furby559 Jul 28 '15 at 18:21
Not sure if that would be the best option. I was thinking of it but it doesnt seem python has a tree datatype built in. – furby559 Jul 28 '15 at 19:15
No, it doesn't. You can find some implementations though, like discussed in this post: http://stackoverflow.com/questions/2482602/a-general-tree-implementation-in-python – irios Jul 28 '15 at 19:45

score 0 · Accepted Answer · answered Jul 28 '15 at 19:11

Figured it out... this is what I was going for:

import numpy as np
import simplejson as json
import cherrypy as cp
import operator

global pagearray

def initialize(self):
    global pagearray
    pagearray = np.load("pagearray.npy")
    #return os.path

def getUserPath(self, input):
    subpagearray = []
    possibilities = []
    global pagearray
    for i in pagearray:
        try:
            if input==i[0]:
                subpagearray += [i[1:]]
                possibilities+= [i[1]]
        except IndexError:
            pass
    x = build_dict(possibilities)
    sorted_x = sorted(x.items(), key=operator.itemgetter(1), reverse=True)
    pagearray = subpagearray
    totalelements = len(pagearray)
    returnvaluelist = []
    weight = []
    for i in sorted_x:
        returnvaluelist += [i[0]]
        weight += [(i[1]/(totalelements*1.0))*100]
    return returnvaluelist, weight

def build_dict(a_list=None):
    if a_list is None:
        a_list = []
    site_dict = {}
    for site in a_list:
        try: 
            site_dict[site] = site_dict[site] + 1
        except KeyError:
            site_dict[site] = 1
    return site_dict

Numpy Arrays: Searching for subarrays

3 Answers3