0

I have a following dataframe :

network_id agent_id parent_id
1          10       6
1          11       7
1          12       7
1          13       8
1          6        5
1          7        5
1          8        5
2         104       101
2         105       101
2         106       101
2         107       102
2         108       103
2         101       100
2         102       100
2         103       100

I need to calculate number of children for every agent in every network, parent_id shows the directly connected parent for each node.I am looking for a solution in R or Python

chessosapiens
  • 3,159
  • 10
  • 36
  • 58

1 Answers1

1

In order to achieve more generality in the code I am changing my previous answer to use a recursive solution. I also include your latest comment:

import pandas as pd

cols = ['network_id', 'agent_id', 'parent_id']
df = pd.DataFrame([[1, 10, 6],
                    [1, 11, 7],
                    [1, 12, 7],
                    [1, 13, 8],
                    [1, 6,  5],
                    [1, 7,  5],
                    [1, 8,  5],
                    [2, 104,101],
                    [2, 105,101],
                    [2, 106,101],
                    [2, 107,102],
                    [2, 108,103],
                    [2, 101,100],
                    [2, 102,100],
                    [2, 103,100]], columns = cols)

# For each network, I create a list of all nodes,
# including boths nodes that have children and those who don't
all_nodes_in_networks = df.groupby('network_id')\
                          .apply(lambda x: set(x[['agent_id', 'parent_id']].values.flatten()))\
                          .to_dict()

def find_children(df, node, network, explored_children = []):
    '''
    find direct children of a cerain node within a network 
    '''
    children = df.query('parent_id==@node and network_id==@network')['agent_id'].values.tolist()    
    # Takes care of the case when we go back to an already visited node    
    new_children = set(children) - set(explored_children)

    return new_children

def recursive_find_children(df, node, network, explored_children = []):
    '''
    recursively find all children of a certain node within a network
    '''

    new_children = find_children(df, node, network, explored_children)

    # Exit Case, when we have arrived to a node with no children or we go back to an already visited node
    if not new_children:

        return set(explored_children)

    else: 
    # Recursive call
    # Add direct children and all children of children (to any nested level)
        new_explored_children = set(explored_children).union(set(new_children))
        return set(explored_children).union(*[recursive_find_children(df, nd,network, new_explored_children) for nd in new_children])

Now let's apply the function above to all nodes:

all_children = {network : {node : recursive_find_children(df, node, network) for node in all_nodes_in_networks[network]} for network in all_nodes_in_networks}

all_children
Out[113]: 
{1: {5: {6L, 7L, 8L, 10L, 11L, 12L, 13L},
  6: {10L},
  7: {11L, 12L},
  8: {13L},
  10: set(),
  11: set(),
  12: set(),
  13: set()},
 2: {100: {101L, 102L, 103L, 104L, 105L, 106L, 107L, 108L},
  101: {104L, 105L, 106L},
  102: {107L},
  103: {108L},
  104: set(),
  105: set(),
  106: set(),
  107: set(),
  108: set()}}


all_children_number = {network: {node: len(all_children[network][node]) for node in all_children[network]} for network in all_children}

all_children_number
Out[114]: 
{1: {5: 7, 6: 1, 7: 2, 8: 1, 10: 0, 11: 0, 12: 0, 13: 0},
 2: {100: 8, 101: 3, 102: 1, 103: 1, 104: 0, 105: 0, 106: 0, 107: 0, 108: 0}}

Hope this helps and that the code is clear enough.

FLab
  • 7,136
  • 5
  • 36
  • 69
  • No , i don't want to calculate the directly connected children. for example agent 5 has 7 children not 3 – chessosapiens Nov 29 '16 at 09:10
  • Can you please clarify your question and provide the desired output? It is not clear to me – FLab Nov 29 '16 at 09:13
  • how to find children and children of children . in other words all nodes under the specific node. – chessosapiens Nov 29 '16 at 09:14
  • What i need is to count every agent in lower levels under 5m there are 3 agent directly connected to 5, and for example 7 has two children and 6 and 8 has 1 children so the total agents under 5 are 7. this is what i am looking for. otherwise just using a group by is simple, i am looking for a recursive solution – chessosapiens Nov 29 '16 at 09:19
  • Updated accordingly. Let me know if this is what you are looking for – FLab Nov 29 '16 at 09:38
  • thanks, seems correct but i get the following error:'dict' object has no attribute 'iteritems' how can we handle this error what is the reason? – chessosapiens Nov 29 '16 at 09:51
  • I am using python 3.5 thats the reason i think – chessosapiens Nov 29 '16 at 09:55
  • I tested it in Python 2, while I guess you are using Python 3. Try .items() instead of .iteritems() http://stackoverflow.com/questions/30418481/error-dict-object-has-no-attribute-iteritems-when-trying-to-use-networkx – FLab Nov 29 '16 at 09:55
  • Yes, works properly, Thanks a lot, i really liked your solution, how did you come up with that? – chessosapiens Nov 29 '16 at 09:57
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/129334/discussion-between-flab-and-sanaz). – FLab Nov 29 '16 at 09:58
  • how can we change the code in such a way that we have all the nodes even with zero child? for example for 108 – chessosapiens Nov 29 '16 at 20:27