1

I have a slight problem with the following task.

There are two files. First file(children file) contains connections between children and their parents identification number, second file(names file) contains connections between persons identification number and names.

In the children file on every line there is parents identification code and his/her childs identification code:

47853062345 60907062342
46906183451 38504014543
34105139833 36512129874

Names file has identification code and name:

47853062345 Kadri Kalkun
36512129874 Peeter Peedumets
38504014543 Maria Peedumets
46906183451 Madli Peedumets
34105139833 Karl Peedumets
60907062342 Liisa Maria Jaaniste

It is safe to assume that names file does not contain duplicate names or identification codes. Also every identification code in children file has a corresponding name in names file.

Function connect takes 2 arguments: children file name and names file name. It returns a dictionary where key is a parents name and value is his/her childrens set.

children.txt:

47853062345 60907062342
46906183451 38504014543
34105139833 36512129874
36512129874 38504014543
46906183451 48708252344
36512129874 48708252344

names.txt:

47853062345 Kadri Kalkun
36512129874 Peeter Peedumets
38504014543 Maria Peedumets
46906183451 Madli Peedumets
34105139833 Karl Peedumets
48708252344 Robert Peedumets
60907062342 Liisa Maria Jaaniste

Output:

connect('children.txt', 'names.txt')

{'Peeter Peedumets': {'Maria Peedumets', 'Robert Peedumets'},
'Madli Peedumets': {'Maria Peedumets', 'Robert Peedumets'}, 
'Karl Peedumets': {'Peeter Peedumets'}, 
'Kadri Kalkun': {'Liisa Maria Jaaniste'}}

I have read both files into list and dictionary. Replaced ID codes with names, but i cant wrap my brain about how to get the end result. My code so far:

def connect(children_file,names_file):
    #children = {}
   # with open(children_file, encoding="UTF-8") as f:
        #for line in f:
           #(key, val) = line.split()
           #children[key.strip("\ufeffn' ").strip("\n ")] = val
    with open(children_file, encoding="UTF-8") as ins:
        children = [[n.strip("\ufeffn' ").strip("\n ") for n in line.split()] for line in ins]

    names = {}
    with open(names_file, encoding="UTF-8") as f:
        for line in f:
            splitLine = line.split()
            names[splitLine[0].strip("\ufeffn' ").strip("\n ")] = " ".join(splitLine[1:])
    names.items()
    for lst in children:
      for ind, item in enumerate(lst):
          if item in names:
              lst[ind] = names[item]

    d = {}
    for i in range(len(children[0][:])):
        if children[0][i] not in d:
            d[children[0][i]] = set()
        d[children[0][i]].add(children[1][i])


    return d

print(connect("children.txt","names.txt"))      
loki
  • 9,816
  • 7
  • 56
  • 82
Elandir
  • 47
  • 5

1 Answers1

1

Your code is overall a bit inefficient. Don't make a list of children, make the map directly. You can leverage the dictionary setdefault method, or, you could use a collections.defaultdict, but for simplicity, I will use the former. So, simply:

>>> with io.StringIO(children_str) as cf, io.StringIO(names_str) as nf:
...     parentmap = {}
...     namemap = {}
...     for line in cf:
...         pid, cid = line.strip().split()
...         parentmap.setdefault(pid, set()).add(cid)
...     for line in nf:
...         nid, name = line.strip().split(maxsplit=1) 
...         namemap[nid] = name
...
>>> from pprint import pprint
>>> pprint(parentmap)
{'34105139833': {'36512129874'},
 '36512129874': {'38504014543', '48708252344'},
 '46906183451': {'38504014543', '48708252344'},
 '47853062345': {'60907062342'}}
>>> pprint(namemap)
{'34105139833': 'Karl Peedumets',
 '36512129874': 'Peeter Peedumets',
 '38504014543': 'Maria Peedumets',
 '46906183451': 'Madli Peedumets',
 '47853062345': 'Kadri Kalkun',
 '48708252344': 'Robert Peedumets',
 '60907062342': 'Liisa Maria Jaaniste'}

Note, I'm using io.StringIO to pretend I am working with a file, instead, I'm working with a string I copied from the question directly. But a io.StringIO let's you treat a string like a file, but you just open your files like you normally would. Note also I used the maxsplit argument when I split the lines from names.txt, so the names themselves wouldn't be split.

To get the final result, simply use:

>>> final = {namemap[k]:{namemap[n] for n in v} for k,v in parentmap.items()}
>>> pprint(final)
{'Kadri Kalkun': {'Liisa Maria Jaaniste'},
 'Karl Peedumets': {'Peeter Peedumets'},
 'Madli Peedumets': {'Robert Peedumets', 'Maria Peedumets'},
 'Peeter Peedumets': {'Robert Peedumets', 'Maria Peedumets'}}
Community
  • 1
  • 1
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172