0

i want to find all quoted statements in a text file. i wrote a code and it worked at finding the first quoted statement. however when i used while loop so it can go all over the text and find them all it didn't work. here is the code:

        quoteStart = fullText.index("\"")
        quoteEnd = fullText.index("\"", quoteStart + 1)
        quotedText = fullText[quoteStart:quoteEnd+1]
        print ("{}:{}".format(quoteStart, quoteEnd))
        print (quotedText)

output :

250:338

"When we talk about the Hiroshima and Nagasaki bombing, we never talk about Shinkolobwe,"

how can i add the while loop to go all over the text?

someone
  • 1
  • 2

2 Answers2

0

I think your problem is that quoteStart = fullText.index("\"") will always start in the front of your text.

try out the following:

quoteEnd = -1

while True:
    try:
        quoteStart = fullText.index("\"", quoteEnd+1)
        quoteEnd = fullText.index("\"", quoteStart + 1)
    except ValueError:
        break
        
    quotedText = fullText[quoteStart:quoteEnd+1]
    print ("{}:{}".format(quoteStart, quoteEnd))
    print (quotedText)

Nano Byte
  • 106
  • 3
  • can you explain why you used ( quoteEnd = -1 ) though ? i don't really get it :\ – someone Aug 11 '20 at 09:58
  • As I always start the loop by looking for the next quote after the position `quoteEnd+1` and I want the first time to start at position `0`, i have to set the initial `quoteEnd=-1`. – Nano Byte Aug 11 '20 at 10:19
0

It is always good to provide a minimal working example i.e. it would make it easier to answer this question if you provided sample of what's in fullText.

You don't need a while loop to do this. A regular expression would be a much simpler solution.

Let's assume, fullText = '"When we talk about the Hiroshima and Nagasaki bombing, we never talk about Shinkolobwe," was what one said and "I agree." was what another said.'

You could use a regular expression like below.

import re

quotedText = re.findall(r'"([^"]*)"', fullText)

print(quotedText)

Result:

['When we talk about the Hiroshima and Nagasaki bombing, we never talk about Shinkolobwe,', 'I agree.']

r'"([^"]*)"' is a raw string that represents a regular expression to match any number of occurrences of anything except a double quote between two double quotes.

A good explanation is here.

PSK
  • 347
  • 2
  • 13