Creating a huge amount of objects(neuron) and connecting randomly using dictionaries

Question

I'm experimentally trying to create a new kind of neural network with these criterias:

Each neuron must be a separate object.
Each neuron should have its own thread.
Network must be connected partially and randomly (at startup).
Neurons have to run asynchronously for calculating its output, updating its weights etc.

These are my implementation attempts in Julia and Python:

Python

import random
import itertools
import time
import signal
from threading import Thread
from multiprocessing import Pool
import multiprocessing

POTENTIAL_RANGE = 110000 # Resting potential: -70 mV Membrane potential range: +40 mV to -70 mV --- Difference: 110 mV = 110000 microVolt --- https://en.wikipedia.org/wiki/Membrane_potential
ACTION_POTENTIAL = 15000 # Resting potential: -70 mV Action potential: -55 mV --- Difference: 15mV = 15000 microVolt --- https://faculty.washington.edu/chudler/ap.html
AVERAGE_SYNAPSES_PER_NEURON = 8200 # The average number of synapses per neuron: 8,200 --- http://www.ncbi.nlm.nih.gov/pubmed/2778101

# https://en.wikipedia.org/wiki/Neuron

class Neuron():

    neurons = []

    def __init__(self):
        self.connections = {}
        self.potential = 0.0
        self.error = 0.0
        #self.create_connections()
        #self.create_axon_terminals()
        Neuron.neurons.append(self)
        self.thread = Thread(target = self.activate)
        #self.thread.start()
        #self.process = multiprocessing.Process(target=self.activate)

    def fully_connect(self):
        for neuron in Neuron.neurons[len(self.connections):]:
            if id(neuron) != id(self):
                self.connections[id(neuron)] = round(random.uniform(0.1, 1.0), 2)

    def partially_connect(self):
        if len(self.connections) == 0:
            neuron_count = len(Neuron.neurons)
            for neuron in Neuron.neurons[len(self.connections):]:
                if id(neuron) != id(self):
                    if random.randint(1,neuron_count/100) == 1:
                        self.connections[id(neuron)] = round(random.uniform(0.1, 1.0), 2)
            print "Neuron ID: " + str(id(self))
            print "    Potential: " + str(self.potential)
            print "    Error: " + str(self.error)
            print "    Connections: " + str(len(self.connections))

    def activate(self):
        while True:
            '''
            for dendritic_spine in self.connections:
                if dendritic_spine.axon_terminal is not None:
                    dendritic_spine.potential = dendritic_spine.axon_terminal.potential
                    print dendritic_spine.potential
                self.neuron_potential += dendritic_spine.potential * dendritic_spine.excitement
            terminal_potential = self.neuron_potential / len(self.axon_terminals)
            for axon_terminal in self.axon_terminals:
                axon_terminal.potential = terminal_potential
            '''
            #if len(self.connections) == 0:
            #   self.partially_connect()
            #else:
            self.partially_connect()
            pass

            '''
            if abs(len(Neuron.neurons) - len(self.connections) + 1) > 0:
                self.create_connections()

            if abs(len(Neuron.neurons) - len(self.axon_terminals) + 1) > 0:
                self.create_axon_terminals()
            '''

class Supercluster():

    def __init__(self,size):
        for i in range(size):
            Neuron()
        print str(size) + " neurons created."
        self.n = 0
        self.build_connections()
        #pool = Pool(4, self.init_worker)
        #pool.apply_async(self.build_connections(), arguments)
        #map(lambda x: x.partially_connect(),Neuron.neurons)
        #map(lambda x: x.create_connections(),Neuron.neurons)
        #map(lambda x: x.create_axon_terminals(),Neuron.neurons)

    def build_connections(self):
        for neuron in Neuron.neurons:
            self.n += 1
            #neuron.thread.start()
            neuron.partially_connect()
            print "Counter: " + str(self.n)

Supercluster(10000)

Julia

global neurons = []

type Neuron
    connections::Dict{UInt64,Float16}
    potential::Float16
    error::Float16

    function Neuron(arg1,arg2,arg3)
        self = new(arg1,arg2,arg3)
        push!(neurons, self)
    end

end

function fully_connect(self)
    for neuron in neurons
        if object_id(neuron) != object_id(self)
            self.connections[object_id(neuron)] = rand(1:100)/100
            #push!(self.connections, rand(1:100)/100)
        end
    end
end

function partially_connect(self)
    if isempty(self.connections)
        neuron_count = length(neurons)
        for neuron in neurons
            if object_id(neuron) != object_id(self)
                if rand(1:neuron_count/100) == 1
                    self.connections[object_id(neuron)] = rand(1:100)/100
                    #push!(self.connections, rand(1:100)/100)
                end
            end
        end
        println("Neuron ID: ",object_id(self))
        println("    Potential: ",self.potential)
        println("    Error: ",self.error)
        println("    Connections: ",length(self.connections))
    end
end

function Build()
    for i = 1:10000
        Neuron(Dict(),0.0,0.0)
    end
    println(length(neurons), " neurons created.")
    n = 0
    @parallel for neuron in neurons
        n += 1
        partially_connect(neuron)
        println("Counter: ",n)
    end
end

Build()

Firstly, these parts that are making connections between each neuron partially and randomly, taking too much time. How can I speed up this process/part?

Python

def build_connections(self):
    for neuron in Neuron.neurons:
        self.n += 1
        #neuron.thread.start()
        neuron.partially_connect()
        print "Counter: " + str(self.n)

Julia

n = 0
@parallel for neuron in neurons
    n += 1
    partially_connect(neuron)
    println("Counter: ",n)

Secondly, is that a good idea to give each neuron, its own thread when my goal is creating at least a million neuron? It means it will be like a million thread.

What I'm trying to do here is imitating the biological neural networks in the strict sense, instead of using matrix calculations.

ADDITION:

New version of partially_connect function according to answer:

def partially_connect(self):
    if len(self.connections) == 0:
        neuron_count = len(Neuron.neurons)
        #for neuron in Neuron.neurons:
        elected = random.sample(Neuron.neurons,100)
        for neuron in elected:
            if id(neuron) != id(self):
                #if random.randint(1,neuron_count/100) == 1:
                self.connections[id(neuron)] = round(random.uniform(0.1, 1.0), 2)
        print "Neuron ID: " + str(id(self))
        print "    Potential: " + str(self.potential)
        print "    Error: " + str(self.error)
        print "    Connections: " + str(len(self.connections))

Performance dramatically increased.

I can't answer your question, unfortunately, but just a suggestion - maybe use less bold and italics? It is a bit hard to read. Best of luck :) — miradulo, Apr 18 '16 at 19:59
yeah you can't do a million threads. why would you even want to do this? Python can't into multithreading for performance gain because of the global interpreter lock. — Sebastian Wozny, Apr 18 '16 at 20:00
@SebastianWozny I know because of the GIL issue, I'm also trying to write the same algorithm in Julia Language. — , Apr 18 '16 at 20:01
@SebastianWozny besides the GIL issue of Python. On Operating System level, Is creating a million thread, possible or sensible? — , Apr 18 '16 at 20:02
this thread says no http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux — Sebastian Wozny, Apr 18 '16 at 20:04
Can you explain, in written prose, what you want/expect the behavior of the `partially_connect()` method to be? — aghast, Apr 18 '16 at 20:08
@AustinHastings It's picking 100 neurons on average and appending their object id as key and a randomly picked weight as value to connections(a dictionary) attribute of the neuron. — , Apr 18 '16 at 20:15
this might be a good candidate for micro-threads aka **coroutines**, using either tornado or asyncio. Each neurons could communicate with the supercluster via some sockets (even via websockets), their could each run their sub-coroutines, periodic callbacks etc... — DevLounge, Apr 18 '16 at 20:21
@Apero Oh, thanks for the information I'm not an expert and I wasn't know much about coroutines. Do you have any idea for my first question, speeding up the creation/startup? — , Apr 18 '16 at 20:24
again, coroutines could make most of your code non-blocking, which means that you could "defer" the build_connections to the io_loop, which would do it asynchronously, and make your for loop really fast. Async programming takes a lot of time to understand but is really powerful. Good luck. — DevLounge, Apr 18 '16 at 20:29
Ben Darnel is the guy who maintains tornado and could really help on this. I am not sure I can tag him here though. http://stackoverflow.com/users/2805033/ben-darnell — DevLounge, Apr 18 '16 at 20:31
this can help understanding asyncio for your needs: http://www.pythonsandbarracudas.com/blog/2015/11/22/developing-a-computational-pipeline-using-the-asyncio-module-in-python-3 — DevLounge, Apr 18 '16 at 20:40

score 2 · Answer 1 · answered Apr 19 '16 at 17:57

In Julia, if performance matters: don't use globals (see your neurons array) and don't use untyped arrays (again, see your neurons array). See the performance tips. You should also profile to determine where your bottlenecks are. I'd strongly recommend trying it without the @parallel, until you can get it fast.

I took at look at it myself, and in addition to these I found some surprising bottlenecks:

rand(1:neuron_count/100) creates a floating-point range, not an integer range. This was a huge bottleneck, which profiling instantly identified. Use rand(1:neuron_count÷100).
better not to call object_id, just use !(neuron === self). Or maybe even better, pass the neurons as an array and the integer index of the entry to want to modify.

Fixing these items, I managed to get the execution time of your program (after getting rid of the @parallel, which is unlikely to be helpful, and commenting out the text-display) down from about 140 seconds to 4 seconds. Almost all the runtime is simply spent generating random numbers; you might be able to accelerate this by generating a large pool all at once, rather than generating them one-by-one.

This uses the ProgressMeter package (which you have to install) to display progress.

using ProgressMeter

type Neuron
    connections::Dict{UInt64,Float16}
    potential::Float16
    error::Float16
end

function fully_connect(self, neurons)
    for neuron in neurons
        if object_id(neuron) != object_id(self)
            self.connections[object_id(neuron)] = rand(1:100)/100
            #push!(self.connections, rand(1:100)/100)
        end
    end
end

function partially_connect(self, neurons)
    if isempty(self.connections)
        neuron_count = length(neurons)
        for neuron in neurons
            if !(neuron === self)
                if rand(1:neuron_count÷100) == 1
                    self.connections[object_id(neuron)] = rand(1:100)/100
                    #push!(self.connections, rand(1:100)/100)
                end
            end
        end
#         println("Neuron ID: ",object_id(self))
#         println("    Potential: ",self.potential)
#         println("    Error: ",self.error)
#         println("    Connections: ",length(self.connections))
    end
end

function Build()
    neurons = [Neuron(Dict(),0.0,0.0) for i = 1:10000]
    println(length(neurons), " neurons created.")
    @showprogress 1 "Connecting neurons..." for neuron in neurons
        partially_connect(neuron, neurons)
    end
    neurons
end

neurons = Build()

aghast · Accepted Answer · 2016-04-18T23:31:05.827

Just looking at this code:

def partially_connect(self):
    if len(self.connections) == 0:
        neuron_count = len(Neuron.neurons)
        for neuron in Neuron.neurons[len(self.connections):]:
            if id(neuron) != id(self):
                if random.randint(1,neuron_count/100) == 1:
                    self.connections[id(neuron)] = round(random.uniform(0.1, 1.0), 2)

And based on your reply to my comment on the OP, here's a couple of things:

You are making a copy of the lists when you use syntax like L[0:]. The slice syntax is making a shallow copy of the Neuron.neurons array for each call to your function. That's an O(n) operation, and since you call partially_connect once for each neuron in your build_connections function, that makes it O(n²). (Yikes!)
You are doing work in Python that can and should be done in the library (in C, we hope!). Have a look at e.g. the random.paretovariate() and random.sample() functions. You could easily compute num_connections = random.paretovariate(1.0) * 100 and then say connected_nodes = random.sample(neurons, num_connections). Filter out self from the connected_nodes and you're done.

I think you can get a big performance boost by eliminating n² behavior and by using the built-in library routines.

ADDITION

Responding to your addition, consider this:

def partially_connect(self):
    if len(self.connections) == 0:
        elected = random.sample(Neuron.neurons,100)
        try:
            elected.remove(self)
        except ValueError:
            pass

        for neuron in elected:
            self.connections[id(neuron)] = round(random.uniform(0.1, 1.0), 2)

(I'm ignoring the prints for now.)

I don't know how you would communicate from a neuron to its connected neurons, without iterating all the neurons looking for a match of id() values. I'd suggest you store a reference to the connect objects as the key, and use the weight as the value:

self.connections = [n:round(random.uniform(0.1, 1.0), 2) for n in elected]

This assumes you need to traverse the links from source to target, of course.

As for threading solutions, I don't have a good suggestion. A little googling leads me to some old email threads (heh!) that mention numbers like 405 and 254 as being thread creation limits. I haven't seen any documents saying "Python threading is now UNLIMITED!" or whatever, so I suspect you're going to have to alter the way you implement your solution.

`[len(self.connections):]` was coming from my old revisions and I forgot to remove. I removed now but still there is no difference on behalf of performance. How can I get rid of **O(n²)** and make it **O(n)** complexity? Still, I couldn't understand. — , Apr 18 '16 at 22:15
Your second statement, using `random.sample()` dramatically increased the performance. Thanks a lot! But I still couldn't understand your first statement. — , Apr 18 '16 at 22:28
On the first part, you can't make it down, but you could make it go away. Just converting from `neurons[len(self.connections):]` to `neurons` will eliminate the needless copy. — aghast, Apr 18 '16 at 22:52
I made an addition to question, please take a look. Is it covering all of your statements? Lastly, do you have any idea for threading issue? There will be like a million thread. In comments @Apero suggested using **tornado** or **asyncio**. Do you have any addition to this suggestion? — , Apr 18 '16 at 22:59
See my addition above. ;-) I don't know about tornado or asyncio, so you'll have to investigate those on your own. — aghast, Apr 18 '16 at 23:31

Creating a huge amount of objects(neuron) and connecting randomly using dictionaries

2 Answers2