-3

I need to compare two Linux group files with Python and find a missing user in the group. I used the below code, but it failed if users are in a different order.

with open('group1', 'r') as file1:
    with open('group2', 'r') as file2:
        same = set(file1).difference(file2)

same.discard('\n')

with open('some_output_file.txt', 'w') as file_out:
    for line in same:
        file_out.write(line)

For example,

group1:
test:x:1234:mike,john,scott
test2:x:1234:mike,john
test3:x:1234:tim,dustin,Alex

group2:
test:x:1234:mike,scott,john
test2:x:1234:mike,john,scott
test3:x:1234:dustin,tim

the ideal output would be,

missing group1:
test2:scott

missing group2:
test3:Alex

Should I take each user and compare it? What would be the best way to compare two files?

Mike
  • 297
  • 5
  • 17
  • Your posted code merely compares whole lines, finding missing lines from one of the two files. – Prune Aug 21 '19 at 19:19
  • When asking about homework (1) **Be aware of your school policy**: asking here for help may constitute cheating. (2) Specify that the question is homework. (3) **Make a good faith attempt** to solve the problem yourself first (include your code in your question). (4) **Ask about a specific problem** with your existing implementation; see [Minimal, complete, verifiable example](https://stackoverflow.com/help/minimal-reproducible-example). Also, [here](https://meta.stackoverflow.com/questions/334822/how-do-i-ask-and-answer-homework-questions) is guidance on asking homework questions. – Prune Aug 21 '19 at 19:19

2 Answers2

2

This should work:

def create_dict_from_file(filename):
    """Read one file and extract from it the group name put as key and the user
    in it as values"""
    with open(filename, 'r') as file1:
        all_groups = file1.read().split('\n')
    return {
        one_line.split(':')[0]: one_line.split(':')[-1].split(',')
        for one_line in all_groups
    }


def create_missing_element(reference, other, key):
    """Create a dict with the missing elements if it exists"""
    missing_in_reference = set(reference) - set(other)
    if missing_in_reference:
        return {key: missing_in_reference}
    return {}


file_1_groups = create_dict_from_file('group1')
file_2_groups = create_dict_from_file('group2')

all_missing_group1 = {}
all_missing_group2 = {}
for key in file_1_groups:
    all_missing_group1.update(
        create_missing_element(file_1_groups[key], file_2_groups[key], key)
    )
    all_missing_group2.update(
        create_missing_element(file_2_groups[key], file_1_groups[key], key)
    )

print (all_missing_group1)
print (all_missing_group2)

I let you write the result in a file.

set is a Python structure where you cannot have duplicates and easy to manipulate in order to find missing elements.

I use a dict comprehension in order to create the dictionary with the group name as key (first element in the line when splitting with :) and the user as value (last element in the line when splitting with :). The user value is split again with , as seperator in order to have the users as a list which can be handle easily in Python.

ndclt
  • 2,590
  • 2
  • 12
  • 26
1

Parse each list of names you are comparing into a set then do the set difference.

Here is an example of how you can compare sets of names.

s1 = set(['jay', 'kevin', 'billy'])
s2 = set(['billy', 'jay'])
s3 = set(['billy', 'jay', 'kevin'])
print(s1 - s2)
# {'kevin'}
print(s3 - s1)
# set()

Parsing the names into a set I'll leave up to you to figure out.

Kevin Welch
  • 1,488
  • 1
  • 9
  • 18