-1

I have a book in txt format. I would like to create 2 new text: in the first, I would like to replace all occurencies of the string "Paul" with Paul_1, in the second with Paul_2. I wrote this code:

with open("book.txt", 'r') as original, \
        open("book_1.txt", 'w') as mod1, \
        open("book_2.txt", 'w') as mod2:
    for line in original:
        words = line.split()
        for word in words:
            s="Paul"
            if(word == s):
                mod1.write(word + "_1 ")
                mod2.write(word + "_2 ")
            else:
                mod1.write(word + " ")
                mod2.write(word + " ")
        mod1.write("\n")
        mod2.write("\n")

There is a problem, often some Paul are skipped and therefore, in the end, I have in the same document both Paul and Paul_1 (and Paul and Paul_2). Where is the problem?

Camilla8
  • 161
  • 5
  • 16
  • Is it possible the skipped ones are `Paul,` or `Paul.` and such? – bgse Mar 20 '18 at 17:50
  • @bgse yes, I noticed now that it skipped string like Paul, and Paul'. How can I solve that? – Camilla8 Mar 20 '18 at 17:52
  • you can use the method `startswith()` or remove the punctuation marks with replace (use regex) or compare `word[:-1]` compare word without the last letter/symbol – shahaf Mar 20 '18 at 17:55
  • @Camilla8 `str.split()` by default splits your string using whitespace as a delimiter, and it isn't really suitable for your needs as you can only split by one delimiter if you specify one yourself. You might want to look at [re.split()](https://docs.python.org/3/library/re.html#re.split). – bgse Mar 20 '18 at 17:57

1 Answers1

2

This should help.

import re

with open("book.txt", 'r') as original, \
        open("book_1.txt", 'w') as mod1, \
        open("book_2.txt", 'w') as mod2:
    data = original.read()
    data_1 = re.sub(r"\bPaul\b", 'Paul_1', data)   #Replace any occurrence of Paul with Paul_1 
    data_2 = re.sub(r"\bPaul\b", 'Paul_2', data)   #Replace any occurrence of Paul with Paul_2 
    mod1.write(data_1 + r"\n")
    mod2.write(data_2 +  r"\n")
Rakesh
  • 81,458
  • 17
  • 76
  • 113
  • What do the 'r' in the lasts 2 istructions do? – Camilla8 Mar 20 '18 at 17:55
  • Should take into account edge-cases like `"Paula is a nice lady.".replace("Paul", "Paul_1")` though, given the question is concerning a book text, that isn't too far fetched. – bgse Mar 20 '18 at 18:00
  • @Rakesh your code has problem if Paul is a substring of another. If for instance, there is PostPaul, I get PostPaul_1, while my aim is to replace just Paul and not strings like PostPaul – Camilla8 Mar 22 '18 at 16:44
  • Oh ok. In that case you probably need regex. Let me try to make one tomorrow morning. – Rakesh Mar 22 '18 at 17:29
  • Updated snippet. – Rakesh Mar 23 '18 at 07:53
  • @Rakesh If I have a dynamic word, how should I write the regex? re.sub(r"\b+"word"+"\b", 'Paul_1', data) or re.sub(r"\b+"word"+"r\b", 'Paul_1', data)? – Camilla8 Mar 23 '18 at 11:00
  • you can use `str.format`. Ex: `re.sub(r"\b{0}\b".format(toChange), 'Paul_1', s)` – Rakesh Mar 23 '18 at 11:08