1

I would like to build a dictionary of abreviations.

I have a text file with a lot of abreviations. The text file looks like this(after import)

with open('abreviations.txt') as ab:
    ab_words = ab.read().splitlines()

An extract:

'ACE',
'Access Control Entry',
'ACK',
'Acknowledgement',
'ACORN',
'A Completely Obsessive Really Nutty person',

Now I want to build the dictionnary, where I have every uneven line as a dictionary key and every even line as the dictionary value.

Hence I should be able to write at the end:

ab_dict['ACE']

and get the result:

'Access Control Entry'

Also, How can I make it case-insensitive ?

ab_dict['ace']

should yield the same result

'Access Control Entry'

In fact, it would be perfect, if the output would also be lower case:

'access control entry'

Here is a link to the text file: https://www.dropbox.com/s/91afgnupk686p9y/abreviations.txt?dl=0

  • What if two entries have the same abbreviation? Also check this question: https://stackoverflow.com/questions/2082152/case-insensitive-dictionary – user2390182 Dec 05 '17 at 09:44
  • @schwobaseggl Then you have two different dictionarry keys for the same value. It's not a problem. Thanks for the link ! –  Dec 05 '17 at 09:45
  • 1
    What about keys like `ACe`? Will you only be interested in keys that either are completely lowercase `ace` or completely uppercase `ACE`? – RoadRunner Dec 05 '17 at 10:12
  • @RoadRunner Good point. I think, it would be perfect, if it can deal with such cases. Hence producing 'Access Control entry' –  Dec 05 '17 at 10:30
  • @RoadRunner: If I have a sentence like this: "The ACE is not easy to understand", how can I automatically replace "ACE" in the sentence with 'Access Control Entry' ? –  Dec 05 '17 at 10:55
  • @totyped see my answer below, it has this functionality. – RoadRunner Dec 05 '17 at 11:22

3 Answers3

4

Complete solution with custom ABDict class and Python's generator functionality:

class ABDict(dict):
    ''' Class representing a dictionary of abbreviations'''

    def __getitem__(self, key):
        v = dict.__getitem__(self, key.upper())
        return v.lower() if key.islower() else v

with open('abbreviations.txt') as ab:
    ab_dict = ABDict()

    while True:
        try:
            k = next(ab).strip()    # `key` line
            v = next(ab).strip()    # `value` line
            ab_dict[k] = v
        except StopIteration:
            break

Now, testing (with case-relative access):

print(ab_dict['ACE'])
print(ab_dict['ace'])
print('*' * 10)
print(ab_dict['WYTB'])
print(ab_dict['wytb'])

The output(consecutively):

Access Control Entry
access control entry
**********
Wish You The Best
wish you the best
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • 1
    Very nice solution. – RoadRunner Dec 05 '17 at 10:27
  • Very nice ! Thanks a lot ! Have you seen the comment from RoadRunner ? What should happen for ab_dict['Ace'] ? The perfect output would be: Access Control entry –  Dec 05 '17 at 10:31
  • If I have a sentence like this: "The ACE is not easy to understand", how can I automatically replace "ACE" in the sentence with 'Access Control Entry' ? –  Dec 05 '17 at 10:55
  • @totyped, and why `ab_dict['Ace']` should be presented with `Access Control entry` ? Why not `Access control Entry` or `access Control Entry` ? To add an additional logic - there should be clear and exact conditions – RomanPerekhrest Dec 05 '17 at 12:30
  • @RomanPerekhrest Oh that was a typo. It should have been: Acess control entry. Sorry! ... hence the size of the input should tell the size of the output. –  Dec 05 '17 at 12:53
1

Here's another solution based on the pairwise function from this solution:

from requests.structures import CaseInsensitiveDict

def pairwise(iterable):
    "s -> (s0, s1), (s2, s3), (s4, s5), ..."
    a = iter(iterable)
    return zip(a, a)

with open('abreviations.txt') as reader:
    abr_dict = CaseInsensitiveDict()
    for abr, full in pairwise(reader):
        abr_dict[abr.strip()] = full.strip()
Gino
  • 675
  • 2
  • 10
  • 20
1

Here is an answer that also allows sentences to be replaced with words from the dictionary:

import re
from requests.structures import CaseInsensitiveDict

def read_file_dict(filename):
    """
    Reads file data into CaseInsensitiveDict
    """

    # lists for keys and values
    keys = []
    values = []

    # case sensitive dict
    data = CaseInsensitiveDict()

    # count used for deciding which line we're on
    count = 1

    with open(filename) as file:
        temp = file.read().splitlines()

        for line in temp:

            # if the line count is even, a value is being read
            if count % 2 == 0:
                values.append(line)

            # otherwise, a key is being read
            else:
                keys.append(line)
            count += 1

    # Add to dictionary
    # perhaps some error checking here would be good
    for key, value in zip(keys, values):
        data[key] = value

    return data


def replace_word(ab_dict, sentence):
    """
    Replaces sentence with words found in dictionary
    """

    # not necessarily words, but you get the idea
    words = re.findall(r"[\w']+|[.,!?; ]", sentence)

    new_words = []
    for word in words:

        # if word is in dictionary, replace it and add it to resulting list
        if word in ab_dict:
            new_words.append(ab_dict[word])

        # otherwise add it as normally
        else:
            new_words.append(word)

    # return sentence with replaced words
    return "".join(x for x in new_words)


def main():
    ab_dict = read_file_dict("abreviations.txt")

    print(ab_dict)

    print(ab_dict['ACE'])
    print(ab_dict['Ace'])
    print(ab_dict['ace'])

    print(replace_word(ab_dict, "The ACE is not easy to understand"))

if __name__ == '__main__':
    main()

Which outputs:

{'ACE': 'Access Control Entry', 'ACK': 'Acknowledgement', 'ACORN': 'A Completely Obsessive Really Nutty person'}
Access Control Entry
Access Control Entry
Access Control Entry
The Access Control Entry is not easy to understand
RoadRunner
  • 25,803
  • 6
  • 42
  • 75