-3

I have a lot of words in a text file, each word is not separated by any delimiter, but we can tell the different words because each individual word begins with a capital letter. I want to extract all the words and store them in a list: My python script:

words = ''
with open("words.txt",'r') as mess:
    for l in mess.read():
        if l.isupper():
            words += ','+l
        else:
            words += l
words = [word.strip() for word in words.split(',') if word]
print(words)

Output:

['Apple', 'Banana', 'Grape', 'Kiwi', 'Raspberry', 'Pineapple', 'Orange', 'Watermelon', 'Mango', 'Leechee', 'Coconut', 'Grapefruit', 'Blueberry', 'Pear', 'Passionfruit']

Inside words.txt (note that there are newlines, and this is only an example of the actual text):

AppleBananaGrapeKiwiRaspberry
PineappleOrangeWatermelonMangoLeecheeCoconutGrapefruit
BlueberryPear
Passionfruit

My code works fine, but I'm wondering if there is a special method python can split a text without a delimiter, only by the capitals. If not, can someone show me more practical way?

Red
  • 26,798
  • 7
  • 36
  • 58
  • 6
    Does this answer your question? [Split a string at uppercase letters](https://stackoverflow.com/questions/2277352/split-a-string-at-uppercase-letters) – optimist May 27 '20 at 20:32

2 Answers2

3

Use regular expressions:

import re
test = 'HelloWorldExample'
r_capital = re.compile(r'[A-Z][a-z]*')
r_capital.findall(test) # ['Hello', 'World', 'Example']

Compiling the regular expression will speed up execution when you use it multiple times, i.e. when iterating over a lot of lines of input.

Jan Christoph Terasa
  • 5,781
  • 24
  • 34
  • Compiling the regular expression will speed up execution when you use it multiple times, i.e. when iterating over a lot of lines of input. – Jan Christoph Terasa May 27 '20 at 20:43
  • What do you mean? This is essentially only one line of code. Compiling the expression is optional, you can also call it like `re.findall(r'[A-Z][a-z]*', test)`. – Jan Christoph Terasa May 27 '20 at 20:44
  • I understand that this is not really "python code", but string manipulation and searching is so much easier with regular expressions that basic usage of regexen is a good skill to have in your coder's toolbox. – Jan Christoph Terasa May 27 '20 at 20:52
1

With the new f-strings since python 3.6 you could use

words = "".join([f" {s}" if s.isupper() else s for s in yorufile.read() if s.strip()]).split(" ")[1:]

This is the final version of my attempt but as I go on it becomes uglier and uglier.

(sorry for messing around with deleting posts and making tons of mistakes)

Queuebee
  • 651
  • 1
  • 6
  • 24