How do I sum up lines of string based on a dictionary value?

Question

I am pretty new to python and coding. I am trying to write a code that will print out the total amount of each given wire type from a list of wires. This is a side project for work. I was able to come up with a code to sum up all of the wire for a user defined wire type. Now I would like to make another code that prints out the total of each wire type in the file.

This is the code that I came up with to sum up individual wire type as selected by the user.

wtype = []
w = []
w1 = []

#opens the .TXT file

fhand = input('\nEnter Text File\n')

try:
    if (len(fhand) <= 0):
        fhand = 'test.txt'
    fh = open(fhand)
except:
    print('\nNo File Found:', fhand, '\n')
    exit()

#prints out the possible wire types

for line in fh:
    line = line.rstrip()
    wtype.append(line)  #needed for later in the code
    line2 = line.split(',')[2]
    if line2 not in w:
        w.append(line2)
    else:
        continue
d1 = dict(enumerate(w))
print(d1)

#sums up the selected wire types total length from the given .TXT file

wire = int(input('\nEnter the number that is before the wire type you need:\n'))

for key, val in d1.items():
    if key == wire:
        for x in wtype:
            x = x.split(',')
            if x[2] == val:
                w1.append(x[1])
            else:
                continue
        s = [eval(i) for i in w1]
        print('\nYour will need ', sum(s)/12, ' Feet of ', val, '.\n')

This is the test.txt file, the length is in inches and the converted to feet in the last line of the code sum(s)/12:

the column are WIRE, LENGTH, TYPE, QTY for this file.

WIRE-006A22,72,M22759/16-22-9,1
WIRE-005A22,60,M22759/16-22-9,1
WIRE-004A22,72,M22759/16-22-9,1
WIRE-003A22,72,M22759/16-20-9,1
WIRE-002A22,60,M22759/16-20-9,1
WIRE-001A22,72,M22759/16-22-9,1
WIRE-009A22,72,M22759/16-22-9,1
WIRE-008A22,60,M22759/16-22-9,1
WIRE-007A22,72,M22759/16-20-9,1
WIRE-011A22,72,M22759/16-22-9,1
WIRE-012A22,72,M22759/16-22-9,1
WIRE-014A22,72,M22759/16-20-9,1
WIRE-013A22,60,M22759/16-22-9,1
WIRE-021A22,72,M22759/16-20-9,1
WIRE-031A22,72,M22759/16-22-9,1
WIRE-032A22,72,M22759/16-20-9,1
WIRE-043A22,60,M22759/16-22-9,1
WIRE-054A22,72,M22759/16-20-9,1
WIRE-065A22,72,M22759/16-22-9,1
WIRE-076A22,60,M22759/16-22-9,1
WIRE-087A22,72,M22759/16-22-9,1
WIRE-098A22,72,M22759/16-20-9,1
WIRE-089A22,72,M22759/16-20-9,1
WIRE-078A22,72,M22759/16-20-9,1
WIRE-067A22,60,M22759/16-22-9,1
WIRE-056A22,72,M22759/16-22-9,1
WIRE-045A22,72,M22759/16-20-9,1
WIRE-034A22,60,M22759/16-22-9,1
WIRE-023A22,60,M22759/16-22-9,1
WIRE-012A22,72,M22759/16-20-9,1

The output I am looking to try and achieve is:

output: {'M22759/16-22-9': 100, 'M22759/16-20-9': 71}

and have that be expandable to all the different wire types that could be in d1

Where is the `100` (and the `71`) in the expected output coming from? would this be the sum of the `1`s at the end of the lines that have `M22759/16-22-9` in them, and did you only provide partial data? Or is there some other way you're computing that from the provided sample data? — Grismar, Oct 27 '22 at 23:49
The `100` and `70` are the totals of both the `M22759/16-22-9` and `M22759/16-20-9` if you select those wire types in that initial code block. But as it is, I can only produce a single wire type at a time. I would like to try and produce a sum for all of the wire types in a given `.txt` file. The `1`'s at the end of the lines are quantities for how many of each wire there is. That is just how our machine reads it. `WIRE, LENGTH, TYPE, QTY` is the format of the `test.txt` file. — wortzinator, Oct 27 '22 at 23:57
The point is that your sample has only 12x M22759/16-20-9 and 18x M22759/16-22-9, so your desired output does not match the sample data shown. — jarmod, Oct 28 '22 at 00:05
I see what you are saying. The quantity in the desired output is in feet. The `test.txt` file has the length as inches but converts to feet in the last line of code. I updated the OG post to try and explain that, I forgot that bit. — wortzinator, Oct 28 '22 at 00:21

jarmod · Answer 1 · 2022-10-28T00:35:48.370

Here is one simple way, with very little code, that uses the pandas library.

import pandas

df = pandas.read_csv("test.csv")
df_out = df.groupby("TYPE")["QTY"].sum()
print("Output:", df_out.to_dict())

# Output: {'M22759/16-20-9': 12, 'M22759/16-22-9': 18}

It assumes that the input CSV file looks like this:

WIRE,LENGTH,TYPE,QTY
WIRE-006A22,72,M22759/16-22-9,1
WIRE-005A22,60,M22759/16-22-9,1
WIRE-004A22,72,M22759/16-22-9,1
WIRE-003A22,72,M22759/16-20-9,1
...

If the CSV file has no header, then you can still use pandas. Just tell it there's no header, and then use column numbers instead of column names. For example:

import pandas

df = pandas.read_csv("test-noheader.csv", header=None)
df_out = df.groupby(2)[3].sum()
print("Output:", df_out.to_dict())

Of course you can achieve the same result fairly simply with non-pandas code but I thought it worth sharing how few lines of code this could be.

Here's a simple non-pandas version that uses the standard csv module:

import csv

output = {}

with open("test.csv") as csvfile:
    for row in csv.DictReader(csvfile):
        if row["TYPE"] in output:
            output[row["TYPE"]] += int(row["QTY"])
        else:
            output[row["TYPE"]] = int(row["QTY"])

print("Output:", output)

And again, if the CSV file has no header:

import csv

output = {}

with open("test-noheader.csv") as csvfile:
    for row in csv.DictReader(csvfile, fieldnames=["WIRE", "LENGTH", "TYPE", "QTY"]):
        if row["TYPE"] in output:
            output[row["TYPE"]] += int(row["QTY"])
        else:
            output[row["TYPE"]] = int(row["QTY"])

print("Output:", output)

PS your text file is actually a csv file so it's probably better to name it accordingly (e.g. test.csv).

I believe that the actual file is in .csj for our machine. I can't remember why, but I think the OG code I tried wouldn't talk nice to the .csj file so I just made it a .txt file. The files I get straight from the engineers don't have the headers `WIRE, LENGTH,TYPE,QTY` unfortunately. I added those to try and convey how the file was formatted. I updated the OG post to move the header outside of the code block. Without having the header, is it still feasible to use pandas? I am not familiar with it so I would have to do some reading. — wortzinator, Oct 28 '22 at 00:26
PS it's not clear from your sample data and desired output exactly what's what in terms of feet and inches but obviously you can convert as needed. For example [multiply dataframe column by a scalar](https://stackoverflow.com/questions/33768122/python-pandas-dataframe-how-to-multiply-entire-column-with-a-scalar). — jarmod, Oct 28 '22 at 00:41

wwii · Answer 2 · 2022-10-29T16:39:10.847

1

Similar to what you did - but run through the whole file once making a dictionary while iterating.

import collections
d = collections.defaultdict(int)
with open('thefile.txt') as f:
    for line in f:
        wire,length,type,qty = line.strip().split(',')
        d[type] += int(length)

for type,l in d.items():
    print(type,l)

>>>
M22759/16-22-9 1200
M22759/16-20-9 852

For feet instead of inches:

import collections
d = collections.defaultdict(float)
with open('thefile.txt') as f:
    next(f)
    for line in f:
        wire,length,type,qty = line.strip().split(',')
        d[type] += int(length)/12

edited Oct 29 '22 at 16:39

answered Oct 28 '22 at 00:22

wwii

23,232
7
37
77

Thank you for the response, I have adjusted the OG post, the header `WIRE,LENGTH,TYPE,QTY` is not apart of the file. I tried to convey the format of the file but mistakenly added it as a header. Without having the header is it still as simple as you mentioned, I'm thinking something like `wire = line.strip().split(',')[0]` then do that for Length, and type too? – wortzinator Oct 28 '22 at 00:34
If the first line of the file is not a header then remove `next(f)` - I've edited my answer. – wwii Oct 29 '22 at 16:39

Fernando Beckworth · Answer 3 · 2022-10-28T01:30:27.340

I would suggest using a csv file so you can properly sort data. I have something that hopefully will get you started.It doesn't include your input options but hopefully this works for you. I have some Python experience but still considered a beginner by my standards #It shows. :)

import pandas as py


data = py.read_csv('testfile.csv').sort_values(by=['col3'])


wtype = []
w = []
w1 = []

data_dict = {}
this_val = ''
for x in data.iterrows():
    wire_name = x[1][2]
    if this_val == wire_name:
        data_dict[wire_name] += x[1][1]
    else:
        data_dict[wire_name] = x[1][1]
        this_val = wire_name
#Removed to use dict comp
data_dict = {key:int(val/12) for key,val in data_dict.items()}
#for key,val in data_dict.items():
    #data_dict[key] = int(val/12)

print(data_dict)

How do I sum up lines of string based on a dictionary value?

3 Answers3