converting a file into dict

Question

my_file = "The Itsy Bitsy Spider went up the water spout.
Down came the rain & washed the spider out.
Out came the sun & dried up all the rain,
And the Itsy Bitsy Spider went up the spout again. "

Expected output:

{'the': ['itsy', 'water', 'rain', 'spider', 'sun', 'rain', 'itsy', 'spout'], 'itsy': ['bitsy', 'bitsy'], 'bitsy': ['spider', 'spider'], 'spider': ['went', 'out', 'went'], 'went': ['up', 'up'], 'up': ['the', 'all', 'the'], 'water': ['spout'], 'spout': ['down', 'again'], 'down': ['came'], 'came': ['the', 'the'], 'rain': ['washed', 'and'], 'washed': ['the'], 'out': ['out', 'came'], 'sun': ['dried'], 'dried': ['up'], 'all': ['the'], 'and': ['the'], 'again': []}

My code:

import string

words_set = {}
    for line in my_file:
        lower_text = line.lower()
        for word in lower_text.split():
            word = word.strip(string.punctuation + string.digits)
            if word:
                if word in words_set:
                    words_set[word] = words_set[word] + 1
                else:
                    words_set[word] = 1

I'm able to count the words repeated using above code. – ADARSHA GOWDA Feb 26 '19 at 23:58 — ADARSHA GOWDA, Feb 26 '19 at 23:58
What is your question? And none of your code is formatted. – antfuentes87 Feb 27 '19 at 00:57 — antfuentes87, Feb 27 '19 at 00:57
is there any logic your output expectation – Rifat Alptekin Çetin Feb 27 '19 at 01:00 — Rifat Alptekin Çetin, Feb 27 '19 at 01:00

pylang · Answer 1 · 2019-02-27T07:29:50.200

You can reproduce your expected results with a few concepts:

Given

import string
import itertools as it
import collections as ct


data = """\
The Itsy Bitsy Spider went up the water spout.
Down came the rain & washed the spider out.
Out came the sun & dried up all the rain,
And the Itsy Bitsy Spider went up the spout again.
"""

Code

def clean_string(s:str) -> str:
    """Return a list of lowered strings without punctuation."""
    table = str.maketrans("","", string.punctuation)
    return s.lower().translate(table).replace("  ", " ").replace("\n", " ")


def get_neighbors(words:list) -> dict:
    """Return a dict of right-hand, neighboring words."""
    dd = ct.defaultdict(list)
    for word, nxt in it.zip_longest(words, words[1:], fillvalue=""):
        dd[word].append(nxt)
    return dict(dd)

Demo

words = clean_string(data).split()
get_neighbors(words)

Results

{'the': ['itsy', 'water', 'rain', 'spider', 'sun', 'rain', 'itsy', 'spout'],
 'itsy': ['bitsy', 'bitsy'],
 'bitsy': ['spider', 'spider'],
 'spider': ['went', 'out', 'went'],
 'went': ['up', 'up'],
 'up': ['the', 'all', 'the'],
 'water': ['spout'],
 'spout': ['down', 'again'],
 'down': ['came'],
 'came': ['the', 'the'],
 'rain': ['washed', 'and'],
 'washed': ['the'],
 'out': ['out', 'came'],
 'sun': ['dried'],
 'dried': ['up'],
 'all': ['the'],
 'and': ['the'],
 'again': ['']}

Details

clean_string

You can use any number of ways to remove punctuation. Here we use a translation table to replace most of the punctuation. Others are directly removed via str.replace().

get_neighbors

A defaultdict makes a dict of lists. A new list value is made if a key is missing.
We make the dict by iterating two juxtaposed word lists, one ahead of the other.
These lists are zipped by the longest list, filling the shorter list with an empty string.
dict(dd) ensures a simply dict is returned.

If you solely wish to count words:

Demo

ct.Counter(words)

Results

Counter({'the': 8,
         'itsy': 2,
         'bitsy': 2,
         'spider': 3,
         'went': 2,
         'up': 3,
         'water': 1,
         'spout': 2,
         'down': 1,
         'came': 2,
         'rain': 2,
         'washed': 1,
         'out': 2,
         'sun': 1,
         'dried': 1,
         'all': 1,
         'and': 1,
         'again': 1})

converting a file into dict

1 Answers1