Str into Dict, len of each str as k and list of words with len as v

Question

I have a string here:

str_files_txt = "A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating systems such as CP/M and MS-DOS, where the operating system does not keep track of the file size in bytes, the end of a text file is denoted by placing one or more special characters, known as an end-of-file marker, as padding after the last line in a text file. On modern operating systems such as Microsoft Windows and Unix-like systems, text files do not contain any special EOF character, because file systems on those operating systems keep track of the file size in bytes. There are for most text files a need to have end-of-line delimiters, which are done in a few different ways depending on operating system. Some operating systems with record-orientated file systems may not use new line delimiters and will primarily store text files with lines separated as fixed or variable length records.

'Text file' refers to a type of container, while plain text refers to a type of content.

At a generic level of description, there are two kinds of computer files: text files and binary files"

I am supposed to create a dictionary where the keys are the length of the words and the values are all the words with the same length. And use a list to store all those words.

This is what i have tried, it works, but I'm not sure how to use a loop efficiently to do this, can anyone please share the answer.

files_dict_values = {}
files_list = list(set(str_file_txt.split()))

values_1=[]
values_2=[]
values_3=[]
values_4=[]
values_5=[]
values_6=[]
values_7=[]
values_8=[]
values_9=[]
values_10=[]
values_11=[]


for ele in files_list:
  if len(ele) == 1:
    values_1.append(ele)
    files_dict_values.update({len(ele):values_1})
  elif len(ele) == 2:
    values_2.append(ele)
    files_dict_values.update({len(ele):values_2})
  elif len(ele) == 3:
    values_3.append(ele)
    files_dict_values.update({len(ele):values_3})
  elif len(ele) == 4:
    values_4.append(ele)
    files_dict_values.update({len(ele):values_4})
  elif len(ele) == 5:
    values_5.append(ele)
    files_dict_values.update({len(ele):values_5})
  elif len(ele) == 6:
    values_6.append(ele)
    files_dict_values.update({len(ele):values_6})
  elif len(ele) == 7:
    values_7.append(ele)
    files_dict_values.update({len(ele):values_7})
  elif len(ele) == 8:
    values_8.append(ele)
    files_dict_values.update({len(ele):values_8})
  elif len(ele) == 9:
    values_9.append(ele)
    files_dict_values.update({len(ele):values_9})
  elif len(ele) == 10:
    values_10.append(ele)
    files_dict_values.update({len(ele):values_10})

print(files_dict_values)

Here is the output i got:

{6: ['modern', 'bytes,', 'stored', 'within', 'exists', 'bytes.', 'system', 'binary', 'length', 'files:', 'refers'], 8: ['sequence', 'content.', 'variable', 'records.', 'systems,', 'computer'], 10: ['container,', 'electronic', 'delimiters', 'structured', '(sometimes', 'character,'], 1: ['A', 'a'], 4: ['will', 'line', 'data', 'done', 'last', 'more', 'kind', 'such', 'text', 'Some', 'size', 'need', 'ways', 'have', 'file', 'CP/M', 'with', 'that', 'most', 'name', 'type', 'keep', 'does'], 5: ['store', 'after', 'files', 'while', 'file"', 'known', 'those', 'plain', 'there', 'fixed', 'which', '"Text', 'file.', 'level', 'where', 'track', 'lines', 'kinds', 'text.', 'There'], 9: ['depending', 'Unix-like', 'primarily', 'textfile;', 'separated', 'Microsoft', 'flatfile)', 'operating', 'different'], 3: ['EOF', 'may', 'one', 'and', 'use', 'are', 'two', 'new', 'the', 'end', 'any', 'for', 'few', 'old', 'not'], 7: ['systems', 'denoted', 'Windows', 'because', 'spelled', 'marker,', 'padding', 'special', 'MS-DOS,', 'generic', 'contain', 'system.', 'placing'], 2: ['At', 'do', 'of', 'on', 'as', 'in', 'an', 'or', 'is', 'In', 'On', 'by', 'to']}

score 0 · Answer 1 · answered Sep 30 '20 at 15:21

How about using loops and let json create keys on its own

str_files_txt = "A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating systems such as CP/M and MS-DOS, where the operating system does not keep track of the file size in bytes, the end of a text file is denoted by placing one or more special characters, known as an end-of-file marker, as padding after the last line in a text file. On modern operating systems such as Microsoft Windows and Unix-like systems, text files do not contain any special EOF character, because file systems on those operating systems keep track of the file size in bytes. There are for most text files a need to have end-of-line delimiters, which are done in a few different ways depending on operating system. Some operating systems with record-orientated file systems may not use new line delimiters and will primarily store text files with lines separated as fixed or variable length records. 'Text file' refers to a type of container, while plain text refers to a type of content. At a generic level of description, there are two kinds of computer files: text files and binary files"
op={}
for items in str_files_txt.split():
    if len(items) not in op:
        op[len(items)]=[]
    op[len(items)].append(items)
for items in op:
    op[items]=list(set(op[items]))

score 0 · Answer 2 · answered Sep 30 '20 at 15:21

answer = {}
for word in str_files_text.split():  # loop over all the words
    # use setdefault to create an empty set if the key doesn't exist
    answer.setdefault(len(word), set()).add(word)  # add the word to the set
    # the set will handle deduping

# turn those sets into lists
for k,v in answer.items():
    answer[k] = list(v)

Patrick Artner · Answer 3 · 2020-09-30T15:29:59.240

You got two problems: cleaning your data and creation of the dictionary.

Use a defaultdict(list) after cleaning your words from characters not belonging to them. (This is similar to the dupe's answer ).

from collections import defaultdict


d = defaultdict(list)

text = """A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating systems such as CP/M and MS-DOS, where the operating system does not keep track of the file size in bytes, the end of a text file is denoted by placing one or more special characters, known as an end-of-file marker, as padding after the last line in a text file. On modern operating systems such as Microsoft Windows and Unix-like systems, text files do not contain any special EOF character, because file systems on those operating systems keep track of the file size in bytes. There are for most text files a need to have end-of-line delimiters, which are done in a few different ways depending on operating system. Some operating systems with record-orientated file systems may not use new line delimiters and will primarily store text files with lines separated as fixed or variable length records.
'Text file' refers to a type of container, while plain text refers to a type of content.
At a generic level of description, there are two kinds of computer files: text files and binary files"
"""

# remove the characters ,.!;:-"' from begin/end of all space splitted words
words = [w.strip(",.!;:- \"'") for w in text.split()]

# add words to list in dict, automatically creates list if needed
# your code uses a set as well
for w in set(words):
    d[len(w)].append(w)

# output 
for k in sorted(d):
        print(k,d[k])

Output:

1 ['A', 'a']
2 ['to', 'an', 'At', 'do', 'on', 'In', 'On', 'as', 'by', 'or', 'of', 'in', 'is']
3 ['use', 'the', 'one', 'and', 'few', 'not', 'EOF', 'may', 'any', 'for', 'are', 'two', 'end', 'new', 'old']
4 ['have', 'that', 'such', 'type', 'need', 'text', 'more', 'done', 'kind', 'Some', 'does', 'most', 'file', 'with', 'line', 'ways', 'keep', 'CP/M', 'name', 'will', 'Text', 'data', 'last', 'size']
5 ['track', 'those', 'bytes', 'fixed', 'known', 'where', 'which', 'there', 'while', 'There', 'lines', 'kinds', 'store', 'files', 'plain', 'after', 'level']
6 ['exists', 'modern', 'MS-DOS', 'system', 'within', 'refers', 'length', 'marker', 'stored', 'binary']
7 ['because', 'placing', 'content', 'Windows', 'padding', 'systems', 'records', 'contain', 'special', 'generic', 'denoted', 'spelled']
8 ['computer', 'sequence', 'textfile', 'variable']
9 ['Microsoft', 'depending', 'different', 'Unix-like', 'flatfile)', 'primarily', 'container', 'character', 'separated', 'operating']
10 ['delimiters', 'characters', 'electronic', '(sometimes', 'structured']
11 ['end-of-file', 'alternative', 'end-of-line', 'description']
17 ['record-orientated']

score 0 · Answer 4 · answered Sep 30 '20 at 15:24

str_files_txt = "A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating systems such as CP/M and MS-DOS, where the operating system does not keep track of the file size in bytes, the end of a text file is denoted by placing one or more special characters, known as an end-of-file marker, as padding after the last line in a text file. On modern operating systems such as Microsoft Windows and Unix-like systems, text files do not contain any special EOF character, because file systems on those operating systems keep track of the file size in bytes. There are for most text files a need to have end-of-line delimiters, which are done in a few different ways depending on operating system. Some operating systems with record-orientated file systems may not use new line delimiters and will primarily store text files with lines separated as fixed or variable length records. 'Text file' refers to a type of container, while plain text refers to a type of content. At a generic level of description, there are two kinds of computer files: text files and binary files"

lengthWordDict = {}
for word in str_files_txt.split(' '):
    wordWithoutSpecialChars = ''.join([char for char in word if char.isalpha()])
    wordWithoutSpecialCharsLength = len(wordWithoutSpecialChars)
    if(wordWithoutSpecialCharsLength in lengthWordDict.keys()):
        lengthWordDict[wordWithoutSpecialCharsLength].append(word)
    else:
        lengthWordDict[wordWithoutSpecialCharsLength] = [word]
print(lengthWordDict)

This is my solution, it gets the length of the word(Without special characters ex. Punctuation)

To get the absolute length of the word(With punctuation) replace wordWithoutSpecialChars with word

Output:

{1: ['A', 'a', 'a', 'A', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'], 4: ['text', 'file', 'name', 'kind', 'file', 'that', 'text.', 'text', 'file', 'data', 'file', 'such', 'does', 'keep', 'file', 'size', 'text', 'file', 'more', 'last', 'line', 'text', 'file.', 'such', 'text', 'file', 'keep', 'file', 'size', 'most', 'text', 'need', 'have', 'done', 'ways', 'Some', 'with', 'file', 'line', 'will', 'text', 'with', "'Text", "file'", 'type', 'text', 'type', 'text'], 9: ['(sometimes', 'operating', 'operating', 'end-of-file', 'operating', 'Microsoft', 'character,', 'operating', 'end-of-line', 'different', 'depending', 'operating', 'operating', 'primarily', 'separated', 'container,'], 7: ['spelled', 'systems', 'denoted', 'placing', 'special', 'padding', 'systems', 'Windows', 'systems,', 'contain', 'special', 'because', 'systems', 'systems', 'systems', 'systems', 'records.', 'content.', 'generic'], 8: ['textfile;', 'flatfile)', 'computer', 'sequence', 'computer', 'Unix-like', 'variable', 'computer'], 2: ['an', 'is', 'is', 'of', 'is', 'as', 'of', 'of', 'as', 'In', 'as', 'of', 'in', 'of', 'is', 'by', 'or', 'as', 'an', 'as', 'in', 'On', 'as', 'do', 'on', 'of', 'in', 'to', 'in', 'on', 'as', 'or', 'to', 'of', 'to', 'of', 'At', 'of', 'of'], 3: ['old', 'CP/M', 'and', 'the', 'not', 'the', 'the', 'end', 'one', 'the', 'and', 'not', 'any', 'EOF', 'the', 'are', 'for', 'are', 'few', 'may', 'not', 'use', 'new', 'and', 'are', 'two', 'and'], 11: ['alternative', 'description,'], 10: ['structured', 'electronic', 'characters,', 'delimiters,', 'delimiters'], 5: ['lines', 'MS-DOS,', 'where', 'track', 'bytes,', 'known', 'after', 'files', 'those', 'track', 'bytes.', 'There', 'files', 'which', 'store', 'files', 'lines', 'fixed', 'while', 'plain', 'level', 'there', 'kinds', 'files:', 'files', 'files'], 6: ['exists', 'stored', 'within', 'system.', 'system', 'marker,', 'modern', 'system.', 'length', 'refers', 'refers', 'binary'], 16: ['record-orientated']}

score 0 · Answer 5 · answered Sep 30 '20 at 15:26

0

You can directly add the strings to the dictionary at the right position as follows:

res = {}
for ele in list(set(str_files_txt.split())):
  if len(ele) in res:
    res[len(ele)].append(ele)
  else:
    res[len(ele)] = [ele]
print(res)

answered Sep 30 '20 at 15:26

morryporry

111
4

Str into Dict, len of each str as k and list of words with len as v

5 Answers5