How to capitalize some words in a text file?

Question

I have a text file which have normal sentences. Actually I was in hurry while typing that file so I just capitalized the first letter of first word of the sentence (as per English grammar).

But now I want that it would be better if each word's first letter is capitalized. Something like:

Each Word of This Sentence is Capitalized

Point to be noted in above sentence is of and is are not capitalized, actually I want to escape the words which has equal to or less than 3 letters.

What should I do?

"I want to escape the words which has equal to or less than 3 letters." - there are words with more than 3 characters that should not be capitalized in a title. — dj18, Jul 26 '12 at 17:31

Steven Rumbalski · Answer 1 · 2012-07-26T19:31:09.047

5

for line in text_file:
    print ' '.join(word.title() if len(word) > 3 else word for word in line.split())

Edit: To omit counting punctuation replace len with the following function:

def letterlen(s):
    return sum(c.isalpha() for c in s)

edited Jul 26 '12 at 19:31

answered Jul 26 '12 at 17:28

Steven Rumbalski

44,786
9
89
119

3

Doesn't account for punctuation in computing length of word. – martineau Jul 26 '12 at 17:52
@martineau. Edited to address your concern. – Steven Rumbalski Jul 26 '12 at 19:31
`word.title()` capitalises "can't" as "Can'T". `word.capitalize()`, which would capitalise only the first letter of `word`, may be used instead. – Artur Gaspar Aug 21 '12 at 21:14

score 4 · Answer 2 · edited May 23 '17 at 11:58

4

Take a look at NLTK.

Tokenize each word, and capitalize it. Words such as 'if', 'of' are called 'stop words'. If your criteria is solely the length, Steven's answer is a good way of doing so. In case you want to look up stop words, there is a similar question in SO: How to remove stop words using nltk or python.

edited May 23 '17 at 11:58

Community

1
1

answered Jul 26 '12 at 17:29

clwen

20,004
31
77
94

Artur Gaspar · Accepted Answer · 2012-08-21T22:53:54.613

3

You should split the words, and capitalise only those which are longer than three letters.

words.txt:

each word of this sentence is capitalized
some more words
an other line

-

import string


with open('words.txt') as file:
    # List to store the capitalised lines.
    lines = []
    for line in file:
        # Split words by spaces.
        words = line.split(' ')
        for i, word in enumerate(words):
            if len(word.strip(string.punctuation + string.whitespace)) > 3:
                # Capitalise and replace words longer than 3 (without punctuation).
                words[i] = word.capitalize()
        # Join the capitalised words with spaces.
        lines.append(' '.join(words))
    # Join the capitalised lines.
    capitalised = ''.join(lines)

# Optionally, write the capitalised words back to the file.
with open('words.txt', 'w') as file:
    file.write(capitalised)

edited Aug 21 '12 at 22:53

answered Jul 26 '12 at 17:37

Artur Gaspar

4,407
1
26
28

1

Close, but what about punctuation increasing the letter count of a "word"? – martineau Jul 26 '12 at 17:40
Almost perfect except for embedded punctuation (i.e. "can't"). +1 anyway. – martineau Jul 26 '12 at 18:23
@ArturGaspar How do I prevent this script form writing/printing a blank line at last. – Santosh Kumar Aug 21 '12 at 19:52
@Santosh Remove a blank line from the input file. – Artur Gaspar Aug 21 '12 at 21:00
@ArturGaspar My input file has no **blank line**, just a single line of small case words. – Santosh Kumar Aug 21 '12 at 22:31
@Santosh And the output still contains an extra blank line? Strange, I've tested it before replying to your comment and it worked fine. Can you test it writing to a file instead of printing it? – Artur Gaspar Aug 21 '12 at 22:33
OK! I apologize. I have modified this script a bit. [Here it is](http://pastebin.com/UpWPNmfK). Can you debug it? – Santosh Kumar Aug 21 '12 at 22:43
Sorry, my fault. Instead of `os.linesep.join(lines)` you should use `''.join(lines)`. – Artur Gaspar Aug 21 '12 at 22:53
This is not working as well. Did you seen my modified script?? I am passing argument to script. Bye the way your script was working well before too. But its writing blank line at last after I modified it. That's why I was asking for debugging. – Santosh Kumar Aug 21 '12 at 23:13
@Santosh [This one works.](http://pastebin.com/8JSTvhKx) The blank line in the end is being inserted by the `print` function. – Artur Gaspar Aug 21 '12 at 23:34
@ArturGaspar Really does that worked? I am still getting black line at last. – Santosh Kumar Aug 21 '12 at 23:51
@Santosh Try writing it to a file instead of printing it, then open the output file in a text editor and see if it is right. – Artur Gaspar Aug 21 '12 at 23:54
@ArturGaspar What do you mean by writing? Doing `python script.py input.txt > output.txt`? This way I still get a blank line at last. – Santosh Kumar Aug 21 '12 at 23:59
@Santosh `f = open('outfile.txt', 'w'); f.write(capitalised); f.close()`. – Artur Gaspar Aug 22 '12 at 00:06
@ArturGaspar One more modification. I want if third argument is given, then it take it as a output filename, if not it fallback to default `output.txt` – Santosh Kumar Aug 22 '12 at 00:20
@Santosh Read the documentation of the `argparse` module. If you have already solved the blank line problem, ask another question. – Artur Gaspar Aug 22 '12 at 00:55
@ArturGaspar Truly I'm not linking the way this script saves to file, I was good with doing `python script.py input.txt > output.txt`. Can you add few more lines of script to remove last line? Is it possible? – Santosh Kumar Aug 22 '12 at 01:23
@Santosh What causes the extra line at the end is the `print` function. `sys.stdout.write(capitalised)` should not print the extra blank line. – Artur Gaspar Aug 22 '12 at 06:25
@ArturGaspar You were wrong, I replaced `words = line.split(' ')` with `words = line.split()` and I got what I wanted. – Santosh Kumar Aug 22 '12 at 13:09
@SantoshKumar `words = line.split()` will not preserve spacing. – Artur Gaspar Nov 04 '12 at 18:36
@ArturGaspar You were right. `words = line.split()` does not preserve spacing. The problem is `words = line.split(' ')` creates a blank new line after writing every line. And `sys.stdout.write()` **only** removes newline from the last line. I can't use any one of them. – Santosh Kumar Jan 23 '13 at 08:42

inspectorG4dget · Answer 4 · 2012-07-26T17:58:51.970

1

What you really want is something called a list of stop words. In the absence of this list, you can build one yourself and do this:

skipWords = set("of is".split())
punctuation = '.,<>{}][()\'"/\\?!@#$%^&*' # and any other punctuation that you want to strip out
answer = ""

with open('filepath') as f:
    for line in f:
        for word in line.split():
            for p in punctuation:
                # you end up losing the punctuation in the outpt. But this is easy to fix if you really care about it
                word = word.replace(p, '')  
            if word not in skipwords:
                answer += word.title() + " "
            else:
                answer += word + " "
    return answer # or you can write it to file continuously

edited Jul 26 '12 at 17:58

answered Jul 26 '12 at 17:50

inspectorG4dget

110,290
27
149
241

1

Good approach, but needs to take into account for punctuation (which genrally aren't considered letters in a word). – martineau Jul 26 '12 at 17:56
1

Your update addresses the punctuation issue, but is done in what I suspect is a less than optimal, brute-force way. – martineau Jul 26 '12 at 18:07
@martineau How would you optimize it? – inspectorG4dget Jul 26 '12 at 18:07
Well, for one thing you could create a punctuation set and use it to avoid the `for` loop which most words don't need. Second, the removal of the punctuation character could probably be done with a regex `re.sub()` or even a `str.translate()` unless the characters are unicode. – martineau Jul 26 '12 at 18:14
`re.sub` is a bit of an overkill and could get overly complex if used incorrectly (Fool rush in where angels fear to tread). But I do like the `str.translate` idea – inspectorG4dget Jul 26 '12 at 18:17
@inspectorG4dget Heyy! I didn't wanted to escape only **of** and **is**, that was just an example. I want to escape any word which has 3 or less letters. – Santosh Kumar Aug 21 '12 at 13:44
@Santosh: look at @Steven Rumbalski's answer. He uses `if len(word) > 3` to address that – inspectorG4dget Aug 21 '12 at 14:49
Using `word.capitalize()` instead of `word.title()` avoids the need to remove punctuation. – Artur Gaspar Aug 21 '12 at 21:14

score 0 · Answer 5 · answered Jul 26 '12 at 17:31

0

You could add all the elements from the text file to a list:

list = []
f.open('textdocument'.txt)
for elm in f (or text document, I\'m too tired):
   list.append(elm)

And once you have all the elements in a list, run a for loop that checks each element's length, and if it's greater than three returns the first element upper-cased

new_list = []
for items in list:
   if len(item) > 3:
      item.title()    (might wanna check if this works in this case)
      new_list.append(item)
   else:
   new_list.append(item)    #doesn't change words smaller than three words, just adds them to the new list

And see if that works?

answered Jul 26 '12 at 17:31

Aaron Tp

353
1
3
12

1

http://stackoverflow.com/questions/1549641/how-to-capitalize-the-first-letter-of-each-word-in-a-string-python If my method of capitalization didn't work, try the methods mentioned here.... – Aaron Tp Jul 26 '12 at 17:32
1

`for elm in f` will put each _line_ of the text file into the list, not each word. Your indentation on the last line is a little messed up. – martineau Jul 26 '12 at 17:36
1

Yeah I didn't copy/paste the code I wrote it in the form, which generally doesn't turn out well. – Aaron Tp Jul 26 '12 at 17:43

How to capitalize some words in a text file?

5 Answers5