Print out first sentence of each paragraph in multiple txt documents

Question

I hope I don't bother anyone with this question because it is similar to ones that already have been asked. Still, my problem is that I didn't find the answer for a txt file, but only for html/xml working with BeautifulSoup. Another question gives an answer for Java, but I only worked with Python.

So I have several text documents and I want to get for each of them the first sentence from each paragraph.

I thought opening and reading the text with:

speech1_content = open("1789-04-30-George-Washington.txt", "r")

would be the first step, but I didn't find any answer that leads me further. The idea is to have a paragraph in a txt file:

Among the vicissitudes incident to life no event could have filled me with greater anxieties than that of which the notification was transmitted by your order, and received on the 14th day of the present month. On the one hand, I was summoned by my country, whose voice I can never hear but with veneration and love, from a retreat which I had chosen with the fondest predilection, and, in my flattering hopes, with an immutable decision, as the asylum of my declining years—a retreat which was rendered every day more necessary as well as more dear to me by the addition of habit to inclination, and of frequent interruptions in my health to the gradual waste committed on it by time.

But only have this as an output:

Among the vicissitudes incident to life no event could have filled me with greater anxieties than that of which the notification was transmitted by your order, and received on the 14th day of the present month.

Thanks a lot for your help.

What do you mean by "first line of paragraphs"? you mean every line that comes after a `\n`? What does mark the first line of a paragraph? — Amir, Mar 16 '19 at 16:40

Felix · Accepted Answer · 2019-03-16T16:53:25.687

0

This gives you a list of strings that store the first sentence of each line:

speech1_lines = speech1_content.readlines()
speech1_first_sentences = [line.split('.')[0] for line in speech1_lines]

You can then either print the list

print(speech1_first_sentences)

Or iterate over it

for sentence in speech1_first_sentences:
    print(sentence)

edited Mar 16 '19 at 16:53

answered Mar 16 '19 at 16:42

Felix

1,837
9
26

Ho do I get it printed out? – Nele Mar 16 '19 at 16:51
I adapted the answer. – Felix Mar 16 '19 at 16:53
Now do you now how I could do this for 50 txt files in one folder, but not for each individually? – Nele Mar 16 '19 at 17:16
You could use [globbing](https://stackoverflow.com/a/2186565/10484131). – Felix Mar 16 '19 at 17:38

score 0 · Answer 2 · answered Mar 16 '19 at 16:43

So I suppose you need to read a file until the first line-break ('\n').

In Python, we prefer opening a file with:

with open(filename) as f:
    lines = f.readlines()

Now, one line goes until the first linebreak. The first paragraphs is simply the first element of lines ([0]) and the first . can be obtained by callind the find function on a string. In your case:

eos = lines[0].find('.')
first_sentence = lines[0][0:eos]

If you need more sophisticated sentence-finder, you should take a look at NLTK.

Print out first sentence of each paragraph in multiple txt documents

2 Answers2