How to flatten a multi dimensional array by calculating the sum of a field in Python

Question

Hey StackOverflow members, I have a function that parses a file and creates a multi-dimesnsional array containing the source IP, destination IP, and amount of packets transferred. What I must do is somehow calculate the total amount of packets transferred. So, in this multi-dimensional array, I have a few lines where the source and destination IP are the same, so in that case I have to turn all those lines into just one line and take the sum of the packets transferred.

So, for example, if you have a few lines that look like this:

192.167.1.1 10.0.0.1 500 
192.167.1.1 10.0.0.1 35
192.167.1.1 10.0.0.1 5

It should become this:

192.167.1.1 10.0.0.1 540

The problem is I have no idea how to shorten my multi-dimensional array and add the sum of the third field and remove the rest of the lines that have the same source and destination IP. This is done on the Python language.

Thank you in advance.

Best regards, Babak

Check out [this question](https://stackoverflow.com/q/5695208/1328439) on the use of `itertools.groupby()` — Dima Chubarov, Oct 14 '17 at 03:31
Welcome to SO. Unfortunately this isn't a discussion forum or a code writing service. Please take the time to read [ask] and the links it contains. — wwii, Oct 14 '17 at 04:39

score 2 · Accepted Answer · answered Oct 14 '17 at 03:32

2

This can be handled in a couple of ways. You could use defaultdict to create a dictionary where the keys are a tuple of the IPs and the value is a running sum of the packets sent. Or you could use the popular pandas package to read in the data, and groupby the source and dest IPs, and then sum.

from collections import defaultdict

d = defaultdict(int)

with open('/path/to/ip_data_file.txt', 'r') as fp:
    for line in fp:
        source, dest, packets = line.strip().split()
        d[(source, dest)] += int(packets)

Or using pandas and assuming your file has no header:

import pandas as pd

df = pd.read_csv('/path/to/ip_data_file.txt', sep=' ', header=0, 
    columns=['source', 'dest', 'packets'])
g = df.groupby(['source', 'dest']).sum()

answered Oct 14 '17 at 03:32

James

32,991
4
47
70

Hi James, First things first, thank you for your quick response and I appreciate that you took the time & effort to help me out here. The second answer looks promising but the thing is that the file containing the information I'm working with is one big mess so I parse it and put it in a multi-dimensional array called Matrix. For example, Matrix[0][0] contains the source IP, Matrix[0][1] contains the dest IP and Matrix[0]3] contains the packets. So is there a way to group it using this array as input instead of opening the file and reading from it? – Babak Oct 14 '17 at 04:07
You can create pandas data frame from array as well which you are getting after parsing. I exactly thought about 2nd answer given above when thinking solution of this question. But it was already answered by JAMES. – Sunnysinh Solanki Oct 14 '17 at 04:55
After working around with my code for a bit, James' second solution worked flawlessly. Thank you so much for your help, James. You're a life saver. – Babak Oct 15 '17 at 00:01

score 0 · Answer 2 · answered Oct 14 '17 at 03:53

I don't know if this is helpful but this program takes the data from a text file (assuming each ip is on its own line) and prints out the number of packets transferred.

doc = open('ip.txt', 'r')
content = doc.read()
content = content.split()

packets = 0

for place, num in enumerate(content):
    #checks for every third number (the packets) and if true adds to sum
    if (place + 1) % 3 == 0:    
        packets += int(num)

print(packets)

How to flatten a multi dimensional array by calculating the sum of a field in Python

2 Answers2