-1

Basically, I have a bunch of numpy arrays each with a list of websites in a larger array. I wanted to, with input from the user, basically return the arrays where the input of the user is the first element of the array. It would return that, the user would input another website, and it would be the second element of the arrays that matched the first time. So for example:

bigarray = [['website1','website2', 'website3', 'website4'],
['website1', 'website7', 'website9', 'website3'],
['website1','website2','website5', 'website9','website24','website36']]

basically if someone were to input 'website1' it would return

{'website2':2, 'website7':1}

after if they were to input website 2 it would output

{'website3':1,"website5":1}

and so on. I hope I was clear, if not, please comment and I'll make it more clear. I don't know how to make this efficient and quick, I've been brainstorming but I can only come up with inefficient methods. Please help,

This is what I have so far, but it doesn't do a dictionary with frequencies. I can't figure out how to get frequencies in the dictionary, nor can I figure out how to get the second third fourth etc. elements searching. This only works for the first element.

import numpy as np
import cherrypy as cp

def initialize(self):
    pagearray = np.load("pagearray.npy")

def submit(self, input):
    for i in pagearray:
        if input==i[0]:
            subpagearray += [i[1:]]
            possibilities +=i[0]
    return possibilities

Thanks, F

furby559
  • 33
  • 1
  • 5

3 Answers3

0
def build_dict(a_list=None):
    if a_list is None:
        a_list = []
    site_dict = {}
    for site in a_list:
        try: 
            site_dict[site] = site_dict[site] + 1
        except KeyError:
            site_dict[site] = 1
    return site_dict

This is how you make a dictionary but I'm not sure what you're going for so you can use this as a template

I figured out what you're going for, I think. Let me know if this is it:

def advanced_dict(a_list=None):
    if a_list is None:
        a_list = []

    index_holder = 0  # Holds the primary dict value
    site_dict = {}  # contains a dict of dicts
    for sub_arr in big_array:
        for site in sub_arr:
            try:
                site_dict['website'+str(index_holder)]
            except KeyError:
                site_dict['website'+str(index_holder)] = {} # if no dict create dict
            try:
                site_dict['website'+str(index_holder)][site] += 1
            except KeyError:
                site_dict['website'+str(index_holder)][site] = 1
            index_holder += 1
        index_holder = 0
    return site_dict
Obj3ctiv3_C_88
  • 1,478
  • 1
  • 17
  • 29
0

You could use a data structure that solves better your problem. Here you can find some options in Python. Try to avoid premature optimization, and keep your code simpler as you can.

irios
  • 165
  • 1
  • 11
  • I'm working with a huge dataset so the premature optimization is a must to chug through all of it :/ – furby559 Jul 28 '15 at 18:21
  • Not sure if that would be the best option. I was thinking of it but it doesnt seem python has a tree datatype built in. – furby559 Jul 28 '15 at 19:15
  • No, it doesn't. You can find some implementations though, like discussed in this post: http://stackoverflow.com/questions/2482602/a-general-tree-implementation-in-python – irios Jul 28 '15 at 19:45
0

Figured it out... this is what I was going for:

import numpy as np
import simplejson as json
import cherrypy as cp
import operator

global pagearray

def initialize(self):
    global pagearray
    pagearray = np.load("pagearray.npy")
    #return os.path

def getUserPath(self, input):
    subpagearray = []
    possibilities = []
    global pagearray
    for i in pagearray:
        try:
            if input==i[0]:
                subpagearray += [i[1:]]
                possibilities+= [i[1]]
        except IndexError:
            pass
    x = build_dict(possibilities)
    sorted_x = sorted(x.items(), key=operator.itemgetter(1), reverse=True)
    pagearray = subpagearray
    totalelements = len(pagearray)
    returnvaluelist = []
    weight = []
    for i in sorted_x:
        returnvaluelist += [i[0]]
        weight += [(i[1]/(totalelements*1.0))*100]
    return returnvaluelist, weight

def build_dict(a_list=None):
    if a_list is None:
        a_list = []
    site_dict = {}
    for site in a_list:
        try: 
            site_dict[site] = site_dict[site] + 1
        except KeyError:
            site_dict[site] = 1
    return site_dict
furby559
  • 33
  • 1
  • 5