1

Here's my problem: each employee is uniquely identified by an id (e. g. KCUTD_41) I have already created a dictionary from a file to gather each company with the employee id and that looks like this:

{    'Company 1' :['KCUTD_41',
                   'KCTYU_48',
                   'VTSYC_48',
                      ......]
     'Company 2' :['PORUH_21',
                   'PUSHB_10',
                    ....... ]
     'Company 3' :['STEYRU_69']}

In total I have several companies.

In parallel in another file, I have several lines where each line corresponds to a collaboration group between companies with several employees and doctoral students (d215485 etc.....)

The file looks like this:

PORUH_21 d215487 d215489 d213654 KCTYU_48 d154225 ...
d25548 d89852 VTSYC_48 d254548 d121154 d258774 PUSHB_10 ...
etc ....

What I want is the number of employees and the number of groups (line where it appears) to get something like that

OUTPUT:

Company 1 : (number of employees from company 1 per line ) : number of groups or line where it appears in total 
Company 2 : (number of employees per line from company2) : nb of groups or line where the employees from company2 appears in total
Company 3 : ......

I wanted to use a condition in order to see if the values for each keys from my dictionary matches and if yes count the number of occurrences

I hope it's better now ^^'

If you can help me ^^

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • Possible duplicate of [python 3.4 Counting occurrences in a .txt file](https://stackoverflow.com/questions/23232248/python-3-4-counting-occurrences-in-a-txt-file) – clubby789 Oct 19 '19 at 10:13
  • 1
    I'm getting it difficult to find relation between json and file that you have provided can you please try to explain what exactly is relationship between company and file? – Avi Oct 19 '19 at 10:15
  • @Avi I don't think it is json. – Sid Oct 19 '19 at 10:15
  • So, `d254548` is the format of a company or group name? – RightmireM Oct 19 '19 at 10:16
  • Add some more details to explain your logic. You can do it by adding expected output for sample data and explain how you reached to the expected output. – shaik moeed Oct 19 '19 at 10:18
  • Also, can you add some actual output - based on the data you have listed. I'm not clear as to what `Company 1 : (number of employees) : number of groups where it appears ` means :) – RightmireM Oct 19 '19 at 10:33
  • Well I've just edited the post :) maybe it's clearer now – BillyPocheo Oct 19 '19 at 16:09

1 Answers1

0

I'm not clear exactly how you want the output to look, but this code might help you get to where you want to go...

import re

companies = {
    'Company 1' :['KCUTD_41','KCTYU_48','VTSYC_48'],
    'Company 2' :['PORUH_21','PUSHB_10'],
    'Company 3' :['STEYRU_69']
     }

finalout = {}
for k,v in companies.items():
    finalout[k] = {"number_in_company":len(v)}
print (finalout)

lines_from_file = [
    "PORUH_21 d215487 d215489 d213654 KCTYU_48 d154225", 
    "d25548 d89852 VTSYC_48 d254548 d121154 d258774 PUSHB_10"
]


pattern_groups    = "(d\d+)"
pattern_employees = "([A-Z]_\d+)"
for line in lines_from_file:
    print("---------------------")
    print(line)
    print("Groups per line:", re.subn(pattern_groups, '', line)[1])
    print("Employees per line:", re.subn(pattern_employees, '', line)[1])

OUTPUT:

{'Company 1': {'number_in_company': 3}, 'Company 2': {'number_in_company': 2}, 'Company 3': {'number_in_company': 1}}
---------------------
PORUH_21 d215487 d215489 d213654 KCTYU_48 d154225
Groups per line: 4
Employees per line: 2
---------------------
d25548 d89852 VTSYC_48 d254548 d121154 d258774 PUSHB_10
Groups per line: 5
Employees per line: 2
RightmireM
  • 2,381
  • 2
  • 24
  • 42
  • thanks for the answer, but I have as groups as lines in the file (one line = one group) What I want to do is to count the occurences of the employee per line and to do it for each company; is it better ? I would like to make something like this (if in d.items, count += 1 ...) – BillyPocheo Oct 19 '19 at 15:59