Can't make my code identify specific strings

Question

I'm coding for a little school project. I wanna read a .txt file and find what's between "pergunta[]" and a question mark but can't manage to make my program give me that.

I have already tried what someone suggested here but it doesn't seem to work for me, as it does not retrieve the piece of string I want and apparently does not even enter the if statement.

("perguntas" means questions)

import pyttsx3

speak = pyttsx3.init()

running = True
perguntas = open(r"C:\Users\jeana\Desktop\perguntas.txt", "r")
texto = perguntas.read()

while running:
    if "pergunta5 " in texto:
        data = texto.split("pergunta5 ")[1].split("?")[0]
        print(data) #tried adding this line but it is never printed
        speak.say(data)
        speak.runAndWait()
        running = False
    print("um loop") #I added this just to know the code reaches this point
    running = False

I expected my code to find the question that is between "pergunta[]"(5 in this case, just to simplify) and "?" and text-to-speech it, but for some reason this code simply outputs something that sounds like a "p" and no error messages. I wonder if I'm missing something that is fundamental here...

The text file looks like this:

pergunta1 Quanto é dois mais dois? R: 4 - 2
pergunta2 Quanto é cinco menos 2? R: 3 - 2
pergunta3 Quanto é cinco menos 1? R: 4 - 2
pergunta4 A peppa pig é um? R: Porco - 3
pergunta5 Qual a cor do cavalo branco do napoleão? R: Branco - 3

edit: A simpler version of my code is

text = "a lot of text with some question1 yadayadayada? question2 dayadayadaya?"
if "question1" in text:
    data = text.split("question1")[1].split("?")[0]
    print(data)

and the output should go:

yadayadayada

Try `print`ing `data` before the `speak` statement to debug the issue. — Selcuk, Aug 19 '19 at 02:02
Why don’t you print the value of “data” as well to help you debugging. Oh that’s just what @Selcuk suggested too. — Wilf Rosenbaum, Aug 19 '19 at 02:02
There are a few different places a bug could occur between reading the file, selecting the text, and speaking the text. It would help if you made a [mre]. — wjandrea, Aug 19 '19 at 02:05
I am sure speak.say() can handle UTF-8 characters @JohnGordon — Jean Alves, Aug 19 '19 at 02:10
If the `print()` statement never shows up, then either your data file is not as shown, or something else is going on. How are you running the code? Do you just type `python myscript.py` at the command line, or do you run from an IDE, or some other way? — John Gordon, Aug 19 '19 at 02:13
If I run your code, but remove the `speak`-related stuff, it runs just fine and prints question 5. The issue appears to be with the text-to-speech library you're using. Alternatively, your text file may have some encoding or other issue that we're not privy to (I just pasted the content you provided here into a text file and ran with that an it was fine). — Grismar, Aug 19 '19 at 02:18
Is there a reason you have everything in a `while` loop? This effectively does nothing, since every code path ends with `running = False`, so why did you put that in? — Grismar, Aug 19 '19 at 02:21
I created the .txt file using windows noteblock. I am just guessing it didn't add any special encoding? — Jean Alves, Aug 19 '19 at 02:23
About running it on ```while```, it's because this is gonna be a working TTS that will run over and over asking several questions to some kids who don't know how to read yet. No practical use in this specific piece of code — Jean Alves, Aug 19 '19 at 02:25
Since you said in [your answer](https://stackoverflow.com/a/57549956/4518341) it was an encoding problem, not a problem with your code per se, I'm voting to close the question as "can no longer be reproduced". — wjandrea, Aug 19 '19 at 03:00

score 0 · Answer 1 · answered Aug 19 '19 at 02:14

0

Use built-in the method "with" to read the file, a regular expression is to use to split statement, 'pergunta.' here "." means anything. Refer regex for python for more clarification.

import re
with open('perguntas.txt','r') as f:
     content = f.read()

sp = re.split('pergunta.', content)
print(sp)

answered Aug 19 '19 at 02:14

Sudhirln92

99
2
6

1

This doesn't explain why the current code is not working, which is the real question. – John Gordon Aug 19 '19 at 02:17

score 0 · Accepted Answer · answered Aug 19 '19 at 02:51

0

Turns out @JohnGordon made me see what was wrong. The code itself has no problems, but the .txt file has some more encoding to it. All I did was to simply paste all the text into a string inside the code and it worked just fine.

answered Aug 19 '19 at 02:51

Jean Alves

78
9

Can't make my code identify specific strings

2 Answers2