Lines in text file won't iterate through for loop Python

Question

I am trying to iterate through my questions and lines in my .txt file. Now this question may have been asked before, but I am really having trouble with this.

this is what I have right now:

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch

max_seq_length = 512

tokenizer = AutoTokenizer.from_pretrained("henryk/bert-base-multilingual-cased-finetuned-dutch-squad2")
model = AutoModelForQuestionAnswering.from_pretrained("henryk/bert-base-multilingual-cased-finetuned-dutch-squad2")

f = open("glad.txt", "r")

questions = [
    "Welke soorten gladiatoren waren er?",
    "Wat is een provocator?",
    "Wat voor helm droeg een retiarius?",
]
for question in questions:
    print(f"Question: {question}")
    for _ in range(len(question)):
        for line in f:
            text = str(line.split("."))
            inputs = tokenizer.encode_plus(question,
                                           text,
                                           add_special_tokens=True,
                                           max_length=100,
                                           truncation=True,
                                           return_tensors="pt")
            input_ids = inputs["input_ids"].tolist()[0]

            text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
            answer_start_scores, answer_end_scores = model(**inputs, return_dict=False)

            answer_start = torch.argmax(
                answer_start_scores
            )  # Get the most likely beginning of answer with the argmax of the score
            answer_end = torch.argmax(
                answer_end_scores) + 1  # Get the most likely end of answer with the argmax of the score

            answer = tokenizer.convert_tokens_to_string(
                tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))

            print(text)
            # if answer == '[CLS]':
            #   continue
            # elif answer == '':
            #   continue
            # else:
            #   print(f"Answer: {answer}")
            #   print(f"Answer start: {answer_start}")
            #   print(f"Answer end: {answer_end}")
            #   break

and this is the output:

> Question: Welke soorten gladiatoren waren er?
> ['Er waren vele soorten gladiatoren, maar het meest kwamen de thraex, de retiarius en de murmillo voor', ' De oudste soorten gladiatoren droegen de naam van een volk: de Samniet en de Galliër', '\n']
> ['Hun uitrusting bestond uit dezelfde wapens als die waarmee de Samnieten en Galliërs in hun oorlogen met de Romeinen gewoonlijk vochten', '\n']
> ['De Thraciër (thraex) verscheen vanaf de tweede eeuw voor Chr', ' Hij had een vrij klein  kromzwaard (sica), een klein rond (soms vierkant) schild, een helm en lage beenplaten', " De retiarius ('netvechter') had een groot net (rete) met een doorsnee van 3 m, een drietand en soms  ook een dolk", '\n']
> ['Hij had alleen bescherming om zijn linkerarm en -schouder', ' Vaak droeg hij ook een bronzen beschermingsplaat (galerus) van zijn nek tot linkerelleboog', ' Vaak vocht de retiarius tegen de secutor die om die reden ook wel contraretiarius werd genoemd', '\n']
> ['Hij had een langwerpig schild en een steekzwaard', ' Opvallend was zijn eivormige helm zonder rand en met een metalen kam, waarschijnlijk zo ontworpen om minder makkelijk in het net van de retiarius vast te haken', ' Een provocator (‘uitdager’) vocht doorgaans tegen een andere provocator', '\n']
> ['Hij viel zijn tegenstander uit een onverwachte hoek plotseling aan', ' Hij had een lang rechthoekig schild, een borstpantser, een beenplaat alleen over het linkerbeen, een helm en een kort zwaard', '']
> Question: Wat is een provocator?
> Question: Wat voor helm droeg een retiarius?

But the sentences are supposed to repeat in the other questions too.

Does anyone know what I am doing wrong here? It is probably something really easy, but I really don't seem the find the mistake.

score 1 · Answer 1 · answered Dec 19 '20 at 18:30

1

Your f is just an open file which is exhausted the first time through. I think you meant this:

f = list(open("glad.txt", "r"))

answered Dec 19 '20 at 18:30

quamrana

37,849
12
53
71

Now the lines endlessly keep iterating at one question – Liza Darwesh Dec 19 '20 at 18:34
1

Perhaps you don't need the second for loop: `for _ in range(len(question)):`? – quamrana Dec 19 '20 at 18:37

SJ11 · Accepted Answer · 2020-12-19T18:43:58.067

You would need to add f.seek(0) after your first parse through the file. This is because when you read the file once, the cursor is at the end of the file, after which for line in f does not read the file from the beginning again. Please refer to Tim and Nunser's answer here which explains it well.

Something like this:

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch

max_seq_length = 512

tokenizer = AutoTokenizer.from_pretrained("henryk/bert-base-multilingual-cased-finetuned-dutch-squad2")
model = AutoModelForQuestionAnswering.from_pretrained("henryk/bert-base-multilingual-cased-finetuned-dutch-squad2")

f = open("glad.txt", "r")

questions = [
    "Welke soorten gladiatoren waren er?",
    "Wat is een provocator?",
    "Wat voor helm droeg een retiarius?",
]
for question in questions:
    print(f"Question: {question}")
    for _ in range(len(question)):
        for line in f:
            text = str(line.split("."))
            inputs = tokenizer.encode_plus(question,
                                           text,
                                           add_special_tokens=True,
                                           max_length=100,
                                           truncation=True,
                                           return_tensors="pt")
            input_ids = inputs["input_ids"].tolist()[0]

            text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
            answer_start_scores, answer_end_scores = model(**inputs, return_dict=False)

            answer_start = torch.argmax(
                answer_start_scores
            )  # Get the most likely beginning of answer with the argmax of the score
            answer_end = torch.argmax(
                answer_end_scores) + 1  # Get the most likely end of answer with the argmax of the score

            answer = tokenizer.convert_tokens_to_string(
                tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))

            print(text)
        f.seek(0) # reset cursor to beginning of the file

Thank you! Both the answers work exactly the same as the answer above — Liza Darwesh, Dec 19 '20 at 18:54

Lines in text file won't iterate through for loop Python

2 Answers2