0

Let's say a text file with two columns like below

A "
A "
A l
A "
C r
C "
C l
D a
D "
D "
D "
D d
R "
R "
R "
R " 
S "
S "
S o
D g
D "
D "
D "
D j
A "
A "
A z

I would like retrieve the information like below

list1= {A:l}, {C:r,l}, {D:a,d}, {S:o}
final_list= {A:l}, {C:r,l}, {D:a,d}, R{}, {S:o}

I understand that , I have to access the text file line.strip().split()

and after that I don't know how to proceed.

Rangooski
  • 825
  • 1
  • 11
  • 29

2 Answers2

1
import collections
list1 = collections.defaultdict(set)
final_list = collections.defaultdict(set)
for line in filetext: ## assuming youve opened it, read it in
    key, value = line.strip().split()
    final_list[key].add(value)
    if value != '"':
        list1[key].add(value)

This is slightly different in that final_list will have the empty string as an element; this doesn't match what you said, so let's alter it a little:

import collections
list1 = collections.defaultdict(set)
final_list = {}
for line in filetext: ## assuming youve opened it, read it in
    key, value = line.strip().split()
    if key not in final_list:
        final_list[key] = set()
    if value != '"':
        list1[key].add(value)
final_list.update(list1)

This should give you what you want - existence with empty-sets for things like R.

dwanderson
  • 2,775
  • 2
  • 25
  • 40
  • In the second answer the second if loops shows indentation error. – Rangooski Feb 11 '16 at 14:25
  • Which line has an indentation error? I intentionally put `final_list.update` after all the lines because you only need to do it once, at the end of the file. If it's something else, just let me know and I'll fix it – dwanderson Feb 11 '16 at 14:26
  • `import collections list1 = collections.defaultdict(set) final_list = {} with open('test.txt', 'r') as f: for line in f: ## assuming youve opened it, read it in key, values = line.strip().split() if key not in final_list: final_list[key] = set() if values: list1[key].add(values) final_list.update(list1) print(list1) print(final_list)` – Rangooski Feb 11 '16 at 14:28
  • final_list = `{'A': {'"'}, 'C': {'r'}, 'D': {'a'}, 'R': {'"'}, 'S': {'"'}}`, – Rangooski Feb 11 '16 at 14:31
  • final_list = `{'A': {'"'}, 'C': {'r'}, 'D': {'a'}, 'R': {'"'}, 'S': {'"'}}` & list1 = `defaultdict(set, {'A': {'"'}, 'C': {'r'}, 'D': {'a'}, 'R': {'"'}, 'S': {'"'}})` – Rangooski Feb 11 '16 at 14:33
  • Oh, instead of `if value:` do `if value != '"':`; I'll fix that – dwanderson Feb 11 '16 at 14:35
  • 1
    But your `final_list` is actually a dictionary :) Don't do that. – Alex Belyaev Feb 11 '16 at 14:42
  • Oh yeah, better variable names, for sure. Just sticking with OP so it's easier for them to follow, but I agree with you – dwanderson Feb 11 '16 at 14:46
  • @alex Belyaev Oh yes, I forgot that. Shall I make the Final_list as List after ? Is it possible ? – Rangooski Feb 11 '16 at 14:51
1

In case if order of dicts in final_list DOESN'T matter:

from collections import defaultdict

with open('/home/bwh1te/projects/stackanswers/wordcount/data.txt') as f:
    occurencies = defaultdict(list)
    for line in f:
        key, value = line.strip().split()
        # invoke of occurencies[key] in this condition
        # cause autocreating of this key in dict
        if value not in occurencies[key] and value.isalpha(): 
            occurencies[key].append(value)

# defaultdict(<class 'list'>, {'C': ['r', 'l'], 'D': ['a', 'd'], 'S': ['o'], 'A': ['l'], 'R': []})
# Use it like a simple dictionary

# In case if it must be a list, not a dict:
final_list = [{key: value} for key, value in occurencies.items()]
# [{'C': ['r', 'l']}, {'D': ['a', 'd']}, {'S': ['o']}, {'A': ['l']}, {'R': []}]

In case if order of dicts in final_list DOES matter:

from collections import OrderedDict

with open(file_path) as f:
    occurencies = OrderedDict()
    for line in f:
        key, value = line.strip().split()
        # Create each key anyway
        if key not in occurencies:
            occurencies[key] = []        
        if value.isalpha():
            if value not in occurencies[key]:
                occurencies[key].append(value)

# OrderedDict([('A', ['l']), ('C', ['r', 'l']), ('D', ['a', 'd']), ('R', []), ('S', ['o'])])

# In case if it must be a list, not a dict
final_list = [{key: value} for key, value in occurencies.items()]
# [{'A': ['l']}, {'C': ['r', 'l']}, {'D': ['a', 'd']}, {'R': []}, {'S': ['o']}]

list1 = [{key: value} for key, value in occurencies.items() if value]
# [{'A': ['l']}, {'C': ['r', 'l']}, {'D': ['a', 'd']}, {'S': ['o']}]

Or you can implement hybrid of OrderedDict and defauldict like that: Can I do an ordered, default dict in Python? :)

Community
  • 1
  • 1
Alex Belyaev
  • 1,417
  • 1
  • 11
  • 15
  • The order matters here.I will be comparing the `for all in list1` I will compare `final_list [-1]` & `final_list [1]`. – Rangooski Feb 11 '16 at 14:41
  • @Rangooski Okay... It should preserve order of file records or sort alphabetically? – Alex Belyaev Feb 11 '16 at 14:45
  • it should be in order of file records. Not alphabetically. – Rangooski Feb 11 '16 at 14:46
  • Thank you so much for the answer. I will try learn the this concept of Ordered Dict. – Rangooski Feb 11 '16 at 15:03
  • final_list dint give the expected result. It gives `final_list = [{'A': ['l']}, {'C': ['r', 'l']}, {'D': ['a', 'd']}, {'S': ['o']}]` – Rangooski Feb 11 '16 at 15:33
  • @Rangooski emm... what is the difference? There should be an 'empty' dict for 'R'? – Alex Belyaev Feb 12 '16 at 10:31
  • But the `list1` is not here . `final_list = [{'A': ['l']}, {'C': ['r', 'l']}, {'D': ['a', 'd']}, {'R': []}, {'S': ['o']}]` `occurencies = OrderedDict([('A', ['l']), ('C', ['r', 'l']), ('D', ['a', 'd']), ('R', []), ('S', ['o'])])` – Rangooski Feb 12 '16 at 10:47
  • Oh, it wasn't clear... I thought that `list1` is some temporary list to create a `final_list` :) And what do you expect to see in `list1`? – Alex Belyaev Feb 12 '16 at 10:50
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/103293/discussion-between-rangooski-and-alex-belyaev). – Rangooski Feb 12 '16 at 12:06