I have posted a bit here related to a project I have been trying to work on and I keep hitting design problems and have to design from scratch. So I'm wondering if I can post what I'm trying to do and someone can help me understand how I can get the result I want.
BackGround:
I'm new to programming and trying to learn. So I took a project that interested me which involves basically taking list and breaking down each number using only numbers from the list. I know I could easily brute force this(which I did) but I wanted to also learn Hbase, Hadoop, and parallel processing so I wanted do it in a way that I could break the process across various machines. I thought the way to do this was using dynamic programming and recursions to create a table of possibilities that could be broken down even further.
Example:
If I submit the list: [1,2, 4]
I should get {1: [[1]], 2: [[1, 1]], 4: [[2, 2]]}
. What its basically saying is 2+2=4, 1+1=2, and 1=1..so trying to see all the ways to make 4, I can just look up this list(which will be in a database) and see 2+2=4, then break down 2..and so on.. I have the lookup job working but the breakdown I am having problems with. Using brute force will not let me use large numbers(say a million, with a thousand numbers in the list) in a way that I can use hadoop or some other tool to scale it. Here are some more examples of possible results:
[1,2, 3, 4] = {1: [[1]], 2: [[1, 1]], 3: [[1, 2]], 4: [[1, 3], [2, 2]]}
[1,2, 3, 4, 6, 10] = {1: [[1]], 2: [[1, 1]], 3: [[1, 2]], 4: [[1, 3], [2, 2]], 6: [[2, 4], [3, 3]], 10: [[4, 6], [2, 2, 2, 2, 2]]}
[10, 30, 50] = 50: [[10, 10, 30]], 30: [[10, 10, 10]]}
The logic with this approach is that it will not take time to comput the next possible data in the list, so if I send a list with a million numbers it could do it quickly and even scale to a hadoop cluster.
The code I created to get this to work is here but that question was on how to correct the design problem. I got advice that this is a partition problem and looked around and found much simpler versions of what I was trying to do ( activestate ), but its not exactly what I am trying to do because it breaks down a number and does not use a specific data set to do it.
Question:
So hopefully I clearly described what I am trying to do. What do I need to read/study/learn to create a table of a partition of an list in python using dynamic programming so it can scale? Its just a hobby and not time sensitive but I feel I have been working on this for over 3 months and each time I hit design problems that cause me to have to start from scratch. How can I build this correctly(to see my incorrect way see my link above)? I have googled and found solutions to the knapsack problem and partition problems but they seem to be more for school work and not really built to scale with large datasets.
Hopefully someone can give me insight but regardless thank you for reading this.