4

I have a text file which have normal sentences. Actually I was in hurry while typing that file so I just capitalized the first letter of first word of the sentence (as per English grammar).

But now I want that it would be better if each word's first letter is capitalized. Something like:

Each Word of This Sentence is Capitalized

Point to be noted in above sentence is of and is are not capitalized, actually I want to escape the words which has equal to or less than 3 letters.

What should I do?

Santosh Kumar
  • 26,475
  • 20
  • 67
  • 118
  • 3
    "I want to escape the words which has equal to or less than 3 letters." - there are words with more than 3 characters that should not be capitalized in a title. – dj18 Jul 26 '12 at 17:31

5 Answers5

5
for line in text_file:
    print ' '.join(word.title() if len(word) > 3 else word for word in line.split())

Edit: To omit counting punctuation replace len with the following function:

def letterlen(s):
    return sum(c.isalpha() for c in s)
Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
4

Take a look at NLTK.

Tokenize each word, and capitalize it. Words such as 'if', 'of' are called 'stop words'. If your criteria is solely the length, Steven's answer is a good way of doing so. In case you want to look up stop words, there is a similar question in SO: How to remove stop words using nltk or python.

Community
  • 1
  • 1
clwen
  • 20,004
  • 31
  • 77
  • 94
3

You should split the words, and capitalise only those which are longer than three letters.

words.txt:

each word of this sentence is capitalized
some more words
an other line

-

import string


with open('words.txt') as file:
    # List to store the capitalised lines.
    lines = []
    for line in file:
        # Split words by spaces.
        words = line.split(' ')
        for i, word in enumerate(words):
            if len(word.strip(string.punctuation + string.whitespace)) > 3:
                # Capitalise and replace words longer than 3 (without punctuation).
                words[i] = word.capitalize()
        # Join the capitalised words with spaces.
        lines.append(' '.join(words))
    # Join the capitalised lines.
    capitalised = ''.join(lines)

# Optionally, write the capitalised words back to the file.
with open('words.txt', 'w') as file:
    file.write(capitalised)
Artur Gaspar
  • 4,407
  • 1
  • 26
  • 28
  • 1
    Close, but what about punctuation increasing the letter count of a "word"? – martineau Jul 26 '12 at 17:40
  • Almost perfect except for embedded punctuation (i.e. "can't"). +1 anyway. – martineau Jul 26 '12 at 18:23
  • @ArturGaspar How do I prevent this script form writing/printing a blank line at last. – Santosh Kumar Aug 21 '12 at 19:52
  • @Santosh Remove a blank line from the input file. – Artur Gaspar Aug 21 '12 at 21:00
  • @ArturGaspar My input file has no **blank line**, just a single line of small case words. – Santosh Kumar Aug 21 '12 at 22:31
  • @Santosh And the output still contains an extra blank line? Strange, I've tested it before replying to your comment and it worked fine. Can you test it writing to a file instead of printing it? – Artur Gaspar Aug 21 '12 at 22:33
  • OK! I apologize. I have modified this script a bit. [Here it is](http://pastebin.com/UpWPNmfK). Can you debug it? – Santosh Kumar Aug 21 '12 at 22:43
  • Sorry, my fault. Instead of `os.linesep.join(lines)` you should use `''.join(lines)`. – Artur Gaspar Aug 21 '12 at 22:53
  • This is not working as well. Did you seen my modified script?? I am passing argument to script. Bye the way your script was working well before too. But its writing blank line at last after I modified it. That's why I was asking for debugging. – Santosh Kumar Aug 21 '12 at 23:13
  • @Santosh [This one works.](http://pastebin.com/8JSTvhKx) The blank line in the end is being inserted by the `print` function. – Artur Gaspar Aug 21 '12 at 23:34
  • @ArturGaspar Really does that worked? I am still getting black line at last. – Santosh Kumar Aug 21 '12 at 23:51
  • @Santosh Try writing it to a file instead of printing it, then open the output file in a text editor and see if it is right. – Artur Gaspar Aug 21 '12 at 23:54
  • @ArturGaspar What do you mean by writing? Doing `python script.py input.txt > output.txt`? This way I still get a blank line at last. – Santosh Kumar Aug 21 '12 at 23:59
  • @Santosh `f = open('outfile.txt', 'w'); f.write(capitalised); f.close()`. – Artur Gaspar Aug 22 '12 at 00:06
  • @ArturGaspar One more modification. I want if third argument is given, then it take it as a output filename, if not it fallback to default `output.txt` – Santosh Kumar Aug 22 '12 at 00:20
  • @Santosh Read the documentation of the `argparse` module. If you have already solved the blank line problem, ask another question. – Artur Gaspar Aug 22 '12 at 00:55
  • @ArturGaspar Truly I'm not linking the way this script saves to file, I was good with doing `python script.py input.txt > output.txt`. Can you add few more lines of script to remove last line? Is it possible? – Santosh Kumar Aug 22 '12 at 01:23
  • @Santosh What causes the extra line at the end is the `print` function. `sys.stdout.write(capitalised)` should not print the extra blank line. – Artur Gaspar Aug 22 '12 at 06:25
  • @ArturGaspar You were wrong, I replaced `words = line.split(' ')` with `words = line.split()` and I got what I wanted. – Santosh Kumar Aug 22 '12 at 13:09
  • @SantoshKumar `words = line.split()` will not preserve spacing. – Artur Gaspar Nov 04 '12 at 18:36
  • @ArturGaspar You were right. `words = line.split()` does not preserve spacing. The problem is `words = line.split(' ')` creates a blank new line after writing every line. And `sys.stdout.write()` **only** removes newline from the last line. I can't use any one of them. – Santosh Kumar Jan 23 '13 at 08:42
1

What you really want is something called a list of stop words. In the absence of this list, you can build one yourself and do this:

skipWords = set("of is".split())
punctuation = '.,<>{}][()\'"/\\?!@#$%^&*' # and any other punctuation that you want to strip out
answer = ""

with open('filepath') as f:
    for line in f:
        for word in line.split():
            for p in punctuation:
                # you end up losing the punctuation in the outpt. But this is easy to fix if you really care about it
                word = word.replace(p, '')  
            if word not in skipwords:
                answer += word.title() + " "
            else:
                answer += word + " "
    return answer # or you can write it to file continuously
inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
  • 1
    Good approach, but needs to take into account for punctuation (which genrally aren't considered letters in a word). – martineau Jul 26 '12 at 17:56
  • 1
    Your update addresses the punctuation issue, but is done in what I suspect is a less than optimal, brute-force way. – martineau Jul 26 '12 at 18:07
  • @martineau How would you optimize it? – inspectorG4dget Jul 26 '12 at 18:07
  • Well, for one thing you could create a punctuation set and use it to avoid the `for` loop which most words don't need. Second, the removal of the punctuation character could probably be done with a regex `re.sub()` or even a `str.translate()` unless the characters are unicode. – martineau Jul 26 '12 at 18:14
  • `re.sub` is a bit of an overkill and could get overly complex if used incorrectly (Fool rush in where angels fear to tread). But I do like the `str.translate` idea – inspectorG4dget Jul 26 '12 at 18:17
  • @inspectorG4dget Heyy! I didn't wanted to escape only **of** and **is**, that was just an example. I want to escape any word which has 3 or less letters. – Santosh Kumar Aug 21 '12 at 13:44
  • @Santosh: look at @Steven Rumbalski's answer. He uses `if len(word) > 3` to address that – inspectorG4dget Aug 21 '12 at 14:49
  • Using `word.capitalize()` instead of `word.title()` avoids the need to remove punctuation. – Artur Gaspar Aug 21 '12 at 21:14
0

You could add all the elements from the text file to a list:

list = []
f.open('textdocument'.txt)
for elm in f (or text document, I\'m too tired):
   list.append(elm)

And once you have all the elements in a list, run a for loop that checks each element's length, and if it's greater than three returns the first element upper-cased

new_list = []
for items in list:
   if len(item) > 3:
      item.title()    (might wanna check if this works in this case)
      new_list.append(item)
   else:
   new_list.append(item)    #doesn't change words smaller than three words, just adds them to the new list

And see if that works?

Aaron Tp
  • 353
  • 1
  • 3
  • 12
  • 1
    http://stackoverflow.com/questions/1549641/how-to-capitalize-the-first-letter-of-each-word-in-a-string-python If my method of capitalization didn't work, try the methods mentioned here.... – Aaron Tp Jul 26 '12 at 17:32
  • 1
    `for elm in f` will put each _line_ of the text file into the list, not each word. Your indentation on the last line is a little messed up. – martineau Jul 26 '12 at 17:36
  • 1
    Yeah I didn't copy/paste the code I wrote it in the form, which generally doesn't turn out well. – Aaron Tp Jul 26 '12 at 17:43