How to calculate Nearest Neighbour using Networkx and data from CSV file?

Question

I have a CSV file (node.csv) with the data as follows -

    1           2           3           4           5           6           7           8           9           10
1   0           0.257905291 0.775104118 0.239086843 0.002313744 0.416936603 0.194817214 0.163350301 0.252043807 0.251272559
2   0.346100279 0           0.438892758 0.598885794 0.002263231 0.406685237 0.523850975 0.257660167 0.206302228 0.161385794
3   0.753358102 0.222349243 0           0.407830809 0.001714776 0.507573592 0.169905687 0.139611318 0.187910832 0.326950557
4   0.185342928 0.571302688 0.51784403  0           0.003231018 0.295197533 0.216184462 0.153032751 0.216331326 0.317961522
5   0           0           0           0           0           0           0           0           0           0
6   0.478164621 0.418192795 0.646810223 0.410746629 0.002414973 0           0.609176897 0.203461461 0.157576977 0.636747837
7   0.24894327  0.522914349 0.33948832  0.316240267 0.002335929 0.639377086 0           0.410011123 0.540266963 0.587764182
8   0.234017887 0.320967208 0.285193773 0.258198079 0.003146737 0.224412057 0.411725737 0           0.487081815 0.469526333
9   0.302955306 0.080506624 0.261610132 0.22856311  0.001746979 0.014994905 0.63386228  0.486096957 0           0.664434415
10  0.232675407 0.121596312 0.457715027 0.310618067 0.001872929 0.57556548  0.473562887 0.32185564  0.482351246 0

I want to use Networkx Python library to calculate the nearest neighbours in a given network (maximum, minimum numbers included), for example - The program is written in such a way that for a number of iterations, it should be able to produce outputs showing "Node1 neighbours are 2,3" , "Node2 neighbours are 1,3" and so on using an algorithm or inbuilt function from Networkx.

The positions of the nodes are (pos.txt) -

Firstly, is it possible to create a network/graph using floating values less than 1? (The values indicate the connectivity rate from node to node, it also indicates the probability of a successful connection and the probability of a message being passed between nodes)
Can anyone help me with this regard?

Thanks in advance for the help :)

Regarding your first question, yes it is possible, am I assuming that the numbers in `node.csv` are reprensenting the weight of the edge between each node? And do you compute the closest neighbors based on that weight, or based on their position? (based on node.csv I'd say that 5 and 4 are the closest neighbors of 1) — Adonis, Aug 07 '17 at 13:20
@Adonis Thank you for your response. Actually they are not weights of an edge, they are just probabilities of a connection between each node. Using this assumption and their positions, I want to calculate the nearest neighbour (in the experiment, it is assumed that all nodes are connected to each other, but depending on the probability, a message is passed/connection is made). I am not sure if my clarification has helped you in some way (also, how did you guess that 5 and 4 are the closest neighbours of 1 based on the CSV file?) — user136819, Aug 07 '17 at 13:51
Because those are the lowest number in the row of the node with id 1. So basically you compute the "nearest distance" with the euclidean distance between node multiplied by the probability. Do I get it correctly? — Adonis, Aug 07 '17 at 13:52

Adonis · Accepted Answer · 2017-08-07T14:55:52.243

4

Regarding your first question, and assuming we use the numbers from node.csv as the weight for the edges, a simple program allow to compute this graph using networkx:

import matplotlib.pyplot as plt
import networkx as nx
import csv

g = nx.Graph()

i_dict = {}
with open("g.csv","r") as input:
    csv_dict = csv.DictReader(input, skipinitialspace=True, delimiter=",")
    ini = 1
    for row in csv_dict:
        for i in row:
            #print(row[i])
            if type(row[i]) is str:
                g.add_edge(ini, int(i), weight=(float(row[i])))
        ini += 1

pos=nx.spring_layout(g, scale=100.)
nx.draw_networkx_nodes(g, pos)
nx.draw_networkx_edges(g,pos)
nx.draw_networkx_labels(g,pos)
plt.axis('off')
plt.show()

This yields:

Regarding finding the nearest neighbor of let's say node1, still based on the value from node.csv:

min_weight_neighbors = sorted(g[1].items(), key=lambda e: e[1]["weight"] if e[1]["weight"] != 0  else 1000000000)[:2] #remove edges with weight 0 from the computation

Which in turns yields the 2 nodes with the lowest weight:

[(5, {'weight': 0.002313744}), (4, {'weight': 0.185342928})]

Or if you want 2 nodes with the biggest weight:

sorted(g[1].items(), key=lambda e: e[1]["weight"], reverse=True)[:2] #two nodes with the biggest weight

which yields:

[(3, {'weight': 0.753358102}), (4, {'weight': 0.5342928})]

NB: I modified a little bit node.csv:

1,2,3,4,5,6,7,8,9,10
0,0.257905291,0.775104118,0.239086843,0.002313744,0.416936603,0.194817214,0.163350301,0.252043807,0.251272559
0.346100279,0,0.438892758,0.598885794,0.002263231,0.406685237,0.523850975,0.257660167,0.206302228,0.161385794
0.753358102,0.222349243,0,0.407830809,0.001714776,0.507573592,0.169905687,0.139611318,0.187910832,0.326950557
0.5342928,0.571302688,0.51784403,0,0.003231018,0.295197533,0.216184462,0.153032751,0.216331326,0.317961522
0,0,0,0,0,0,0,0,0,0
0.478164621,0.418192795,0.646810223,0.410746629,0.002414973,0,0.609176897,0.203461461,0.157576977,0.636747837
0.24894327,0.522914349,0.33948832,0.316240267,0.002335929,0.639377086,0,0.410011123,0.540266963,0.587764182
0.234017887,0.320967208,0.285193773,0.258198079,0.003146737,0.224412057,0.411725737,0,0.487081815,0.469526333
0.302955306,0.080506624,0.261610132,0.22856311,0.001746979,0.014994905,0.63386228,0.486096957,0,0.664434415
0.232675407,0.121596312,0.457715027,0.310618067,0.001872929,0.57556548,0.473562887,0.32185564,0.482351246,0

edited Aug 07 '17 at 14:55

answered Aug 07 '17 at 13:46

Adonis

4,670
3
37
57

Great answer! I could use this for my experiment. One more question: Is it acceptable to assume that the nearest neighbour is one which has the highest weight? (taking the approach from highest weight to least, inverse of your answer). How to achieve this? Thanks again for your help :) @Adonis – user136819 Aug 07 '17 at 14:02
@MFHS Actually (not being an expert of networkx by any means) I realised that the graph drawn above uses already the highest weight to put nodes closer to each other. Now regarding the `sorted` call you can simply add the parameter `reversed=True`. I also corrected a few things that were wrong in my code (like the not adding an edge when the weight is 0, which was pretty weird...) Also added your `node.csv` slightly modified (to be able to load the data with a csv dictreader) – Adonis Aug 07 '17 at 14:59
Thanks again for helping me out @Adonis, but when I ran the code, I got this error - `ValueError: invalid literal for int() with base 10: ''` . I have no idea how to proceed further now, even though I tried my best to fix it :( – user136819 Aug 07 '17 at 21:31
@MFHS It might make me think about an issue regarding the input, as I mentioned in my edit, I kind of modified your `node.csv` so that it would be easier for me to read it with a `csv.DictReader`. Can you edit your post to show the full stacktrace? – Adonis Aug 07 '17 at 21:40
1

You are right, I hadn't updated the CSV file, I was running the code with the old values. I edited them and now the code works beautifully! Amazing answer @Adonis . I'm very grateful :) – user136819 Aug 07 '17 at 21:53
@MFHS No problem. It was also the occasion for me to learn about `networkx` so we all won in the process. – Adonis Aug 07 '17 at 21:56
Just out of curiosity, is it possible that we can use the Pandas library (with `read_csv`) and keep the code functionality the same? – user136819 Aug 08 '17 at 09:39
You'd probably have to rewrite the loop a little bit (see https://stackoverflow.com/a/16476974/4121573) Otherwise the rest should be pretty much the same. – Adonis Aug 08 '17 at 10:02
I have successfully found the neighbours of node1. But when I try to find neighbours of the other nodes, lets say 2,3 I get an error `tuple index out of range` (I edited the `sorted` function `(g[2] ... e[2])`). Any idea how to fix this? Sorry I keep asking these questions, as I am still a novice :( – user136819 Aug 08 '17 at 12:27
1

@MFHS That's because you should change `g[1]` to `g[2]` but not `e[1]` as the tuple in the list `g.items()` has always the same shape (e.g. `(1, {'weight': 0.753358102})`) and we always seek to retrieve the 2nd part of the tuple thus `e[1]` and more precisely the value associated to weight, thus `e[1]["weight"]` – Adonis Aug 08 '17 at 14:15
Great! One last thing, I was trying to run the `sorted()` function for nearest neighbour through the loops to get the output in a sequence `(1,2,3....)` but whenever I edit `g[2 or 3]` to `g[i]` in the loop, I end up with the error `KeyError: '8'`. Should I not be putting the `i` in `g[]` or do I add something else to get the sequence? I can't thank you enough :) @Adonis – user136819 Aug 08 '17 at 14:43
@MFHS Strange, from what I see `g[8].items()` returns a list (g[18] for instance ends up with the same error). Can you try that first? Besides have you changed anything else, or just this one and only modification? – Adonis Aug 08 '17 at 14:47
I moved the `sorted()` function inside the loop and put the `i` from the `for i in row` loop into `g[i]` hoping to get the sequence. But then I tried with putting `ini` into `g[ini]` and it seems to give me the nearest neighbours in sequence correctly now. Basically a lot of trial and error, but I have my answer :) Although I'm still struggling with using Pandas instead of the regular CSV from your code! – user136819 Aug 08 '17 at 14:58
1

@MFHS Oh my bad, this `i` is actually a `string` and not an `int` thus the error... Probably not the wisest choice on my side ... – Adonis Aug 08 '17 at 15:01
No worries, I have an answer :) But as I mentioned, I'm struggling with using Pandas now. I always get errors like `ValueError: invalid literal for int() with base 10: '0.257905291'` and I have no idea how to resolve them – user136819 Aug 08 '17 at 15:07
@MFHS In that case, if you cannot figure out the solution, and there are no available resources, you might want to ask a new question. Be sure though to google it first. (I'm guessing this error comes from the type conversion made by Pandas that fails here) – Adonis Aug 08 '17 at 15:29
I figured out the Pandas library part. But for some reason the code is unable to remove edges with weight `0` . I guess you faced the same issue before as you mentioned in the previous comments. Any idea how I can resolve this? Thanks for the help – user136819 Sep 04 '17 at 15:18

How to calculate Nearest Neighbour using Networkx and data from CSV file?

1 Answers1