0

So I have written a program (however ugly) that counts the number of words and the instances of each unique word in a given input.

My problem is that I want to use it for song lyrics, but most lyric sets come with multiple paragraph breaks.

My question is: how can I take a user input of lyrics with paragraph breaks and reduce the input down to a single string?

This is my code so far:

Song = {}
lines = []


while True:                                        
    line = input("")                                
    if line:
        lines.append(line)
    else:
        break

string = '\n'.join(lines)

def string_cleaner(string):
    string = string.lower()
    newString = ''
    validLetters = " abcdefghijklmnopqrstuvwxyz"
    newString = ''.join([char for char in string if char in validLetters])
    return newString

def song_splitter(string):
    string = string_cleaner(string)
    words = string.split()
    for word in words:
        if word in Song:
            Song[word] += 1
        else:
            Song[word] = 1

Expected input:

Well, my heart went "boom"
When I crossed that room
And I held her hand in mine
Whoah, we danced through the night
And we held each other tight
And before too long I fell in love with her
Now I'll never dance with another
(Whooh)
Since I saw her standing there
Oh since I saw her standing there
Oh since I saw her standing there

Desired output:

This song has 328 words.
39 of which are unique.

This song is 11% unique words.
('i', 6)
('her', 4)
('standing', 3)
.... etc
KaiserKatze
  • 1,521
  • 2
  • 20
  • 30
applsiid
  • 11
  • 1
  • Why do you need to reduce the input down to a single string? You could append all the lines to a list and traverse it. – KaiserKatze Aug 25 '18 at 02:38
  • I'm new to programming so I may not know all of the short cuts, but my problem at the moment is that when I try to input a line with a paragraph break (in IDLE) IDLE reads the paragraph break as the end of the input. for example IDLE would read:"But when I get home to you I find the things that you do Will make me feel alright // You know I work all day to get you money to buy you things And it's worth it just to hear you say you're going to give me everything" as only "but when ... to ... feel alright" – applsiid Aug 25 '18 at 03:31
  • Maybe you should present example input and desired output in the question. – KaiserKatze Aug 25 '18 at 05:22
  • User input: Well, my heart went "boom" When I crossed that room And I held her hand in mine Whoah, we danced through the night And we held each other tight And before too long I fell in love with her Now I'll never dance with another (Whooh) Since I saw her standing there Oh since I saw her standing there Oh since I saw her standing there Desired output something like: This song has 328 words. 39 of which are unique. This song is 11% unique words. ('i', 6) ('her', 4) ('standing', 3).... etc. – applsiid Aug 25 '18 at 17:28
  • The printed attributes don't matter... This song is blah blah words long... What I'm getting at is that the line break between "And I held her hand in mine" and "Whoah, we danced through the night" is treated as if the user is "entering" only the lines before the break. I am trying to figure out a way to get past that. – applsiid Aug 25 '18 at 17:33

1 Answers1

0

The following example code extracts all the words (English alphabet only) from every line and process them (counts the number of words, and retrieve instances of each unique word).

import re

MESSAGE = 'Please input a new line: '
TEST_LINE = '''
Well, my heart went "boom"
When I crossed that room
And I held her hand in mine
Whoah, we danced through the night
And we held each other tight
And before too long I fell in love with her
Now I'll never dance with another
(Whooh)
Since I saw her standing there
Oh since I saw her standing there well well
Oh since I saw her standing there
'''

prog = re.compile(r'\w+')

class UniqueWordCounter():

    def __init__(self):
        self.data = {}

    def add(self, word):
        if word:
            count = self.data.get(word)
            if count:
                count += 1
            else:
                count = 1
            self.data[word] = count


# instances of each unique word
set_of_words = UniqueWordCounter()
# counts the number of words
count_of_words = 0

def handle_line(line):
    line = line.lower()
    words = map(lambda mo: mo.group(0), prog.finditer(line))
    for word in words:
        global count_of_words
        count_of_words += 1
        set_of_words.add(word)

def run():
    line = input(MESSAGE)

    if not line:
        line = TEST_LINE

    while line:
        '''
        Loop continues as long as `line` is not empty
        '''

        handle_line(line)

        line = input(MESSAGE)

    count_of_unique_words = len(set_of_words.data.keys())
    unique_percentage = count_of_unique_words / count_of_words

    print('-------------------------')
    print('This song has {} words.'.format(count_of_words))
    print('{} of which are unique.'.format(count_of_unique_words))
    print('This song is {:.2%} unique words.'.format(unique_percentage))

    items = sorted(set_of_words.data.items(), key = lambda tup: tup[1], reverse=True)
    items = ["('{}', {})".format(k, v) for k, v in items]

    print('\n'.join(items[:3]))
    print('...')

run()

If you want to handle lyrics in other languages, you should check out this link.

KaiserKatze
  • 1,521
  • 2
  • 20
  • 30