0
def get_unique_words(text):
split_text = text.split()
print(split_text)
for word in text:
    print(word)

Hi there,

in this code, I am trying to create a list which contains the words of text sorted alphabetically. For example, with The quick brown fox jumps over the lazy dog!, it would give ['brown', 'dog', 'fox', 'jumps', 'lazy', 'over', 'quick', 'the'].

However, in the code below, I always get the individual words instead of a list of words.

output

Why am I getting the invidual characters instead of words?

Note: I don't need to type out the get_unique_wods( ) part

2 Answers2

0

You aren't returning anything from your function, but that's a separate issue.

Calling list on a string will make a list of all the characters. You if you do mylist.append(word) instead of my_list = my_list + list(word), you should get closer to what you are looking for.

Also note that capital letters are sorted before lowercase, so as is your list will start with "The".

Tyberius
  • 625
  • 2
  • 12
  • 20
0

I do not believe the for loop is necessary. I think that is what is causing the issue you described. A for loop could be used to check for duplicate values.

I think you can sort your list alphabetically in the following way

def get_unique_words(text):
    # converts all alphabetical characters to lower
    lower_text = text.lower()
    # splits string on space character 
    split_text = lower_text.split(' ')
    # sorts values in list 
    split_text.sort()
    # empty list to populate unique words
    results_list = []
    # iterate over the list
    for word in split_text:
        # check to see if value is already in results lists
        if word not in results_list:
            # append the word if it is unique
            results_list.append(word)
    print(results_list)
text = "The quick brown fox jumps over the lazy dog!"

get_unique_words(text)

This returns the following list

['brown', 'dog!', 'fox', 'jumps', 'lazy', 'over', 'quick', 'the']

Taking the next step you probably want to remove duplicates and also drop any non-alphabetical characters.

For the non-alphabetical characters, it would be best to use regex which can be imported

import re

Here is a good post on how to remove non-alphabetical characters

Python, remove all non-alphabet chars from string

And for removing duplicates it may be best to convert the list into a dictionary. Here is a post on how to do just that

https://www.w3schools.com/python/trypython.asp?filename=demo_howto_remove_duplicates

Joe Thor
  • 1,164
  • 1
  • 11
  • 19
  • Thanks very much! I get duplicates for "Around the world, around the world" and I am not allowed to use dictionaries. How would I do this without them? – Christopher May 05 '21 at 02:18
  • I added a for loop to my original answer to add the unique word to the results_list only if the value does not already exist. – Joe Thor May 05 '21 at 10:54