How to find a paragraph number in text using python?

Question

text =OUR elders are often heard reminiscing nostalgicallyabout those good old Portuguese days, the Portuguese and their famous loaves of bread. Those eaters of loaves might have vanished but the makers are still there.We still have amongst us the mixers, the moulders and those who bake the loaves.

Marriage gifts are meaningless without the sweet bread known as the bol, just as a party or a feast loses its charm without bread. Not enough can be said to show how important a baker can be for avillage. The lady of the house must prepare sandwiches on the occasion of her daughter’s engagement. Cakes and bolin has are a must for Christmas as well as other festivals.

From this text I want to find "The lady of the house must prepare sandwiches on the occasion of her daughter’s engagement." this line, its line number and paragraph number.

for ex. paragraph number = 2

here is the code that I have tried.

     to_search="The lady of the house must prepare sandwiches on the occasion of her daughter’s engagement."
     print(re.findall(r"(?:(?<!^\n)\n(?!^\n)|[^\n])*"+re.escape(to_search)+r"(?:(?<!^\n)\n(?!^\n)|[^\n])*", x, re.DOTALL|re.MULTILINE|re.IGNORECASE))

But this is not working. So, How to find the paragraph number?

score 0 · Answer 1 · answered Sep 22 '22 at 08:26

0

How about this approach, assuming it's an exact match?

text = """OUR elders are often heard reminiscing nostalgicallyabout those good old Portuguese days, the Portuguese and their famous loaves of bread. Those eaters of loaves might have vanished but the makers are still there.We still have amongst us the mixers, the moulders and those who bake the loaves.

Marriage gifts are meaningless without the sweet bread known as the bol, just as a party or a feast loses its charm without bread. Not enough can be said to show how important a baker can be for avillage. The lady of the house must prepare sandwiches on the occasion of her daughter’s engagement. Cakes and bolin has are a must for Christmas as well as other festivals."""

to_search = "The lady of the house must prepare sandwiches on the occasion of her daughter’s engagement."

paragraphs = text.split("\n\n")

for i in range(len(paragraphs)):
    paragraph = paragraphs[i]
    if to_search in paragraph:
        print(f"Text found in paragraph number #{i+1}")
        break

answered Sep 22 '22 at 08:26

gvee

16,732
35
50

This code is not working, because I have a txt file and I am fetching text from that file using this code. `import urllib.request response = urllib.request.urlopen(file_path) html = response.read() text=html.decode('utf8')` . So, I am not getting \n for the paragraph. Is there any solution for that ? – Roshni Hirani Sep 22 '22 at 13:48
What _are_ you getting for the paragraphs? – gvee Sep 23 '22 at 07:19
for any sentence I am getting answer 1. As it consider whole text as one paragraph. – Roshni Hirani Sep 23 '22 at 08:06
@RoshniHirani you're going to have to work out what character(s) indicate a paragraph. We can only help you with the information you provide. If the text in your initial question doesn't reflect reality, you need to update it to something that does. – gvee Sep 23 '22 at 09:26

score 0 · Accepted Answer · answered Sep 22 '22 at 08:29

0

Here is one approach. We can split the input text on two or more consecutive newlines to generate a list of all paragraphs. Then, use a list comprehension and check each paragraph for the target text.

text = """OUR elders are often heard reminiscing nostalgicallyabout those good old Portuguese days, the Portuguese and their famous loaves of bread. Those eaters of loaves might have vanished but the makers are still there.We still have amongst us the mixers, the moulders and those who bake the loaves.

Marriage gifts are meaningless without the sweet bread known as the bol, just as a party or a feast loses its charm without bread. Not enough can be said to show how important a baker can be for avillage. The lady of the house must prepare sandwiches on the occasion of her daughter’s engagement. Cakes and bolin has are a must for Christmas as well as other festivals."""
paragraphs = re.split(r'\n{2,}', text)
search = 'The lady of the house must prepare sandwiches on the occasion of her daughter’s engagement.'
indices = [ind + 1 for ind, x in enumerate(paragraphs) if re.search(re.escape(search), x)]
print(indices)  # [2]

answered Sep 22 '22 at 08:29

Tim Biegeleisen

502,043
27
286
360

This code is not working, because I have a txt file and I am fetching text from that file using this code. `import urllib.request response = urllib.request.urlopen(file_path) html = response.read() text=html.decode('utf8')` . So, I am not getting \n for the paragraph. Is there any solution for that ? – Roshni Hirani Sep 22 '22 at 13:52
@RoshniHirani Yes, just read the entire text file into a Python string, and then use my answer. [See this SO question](https://stackoverflow.com/questions/8369219/how-to-read-a-text-file-into-a-string-variable-and-strip-newlines) for how to do that. – Tim Biegeleisen Sep 22 '22 at 13:54
still not working. for any sentence I am getting answer 1. As it consider whole text as one paragraph. – Roshni Hirani Sep 23 '22 at 06:09
My answer itself is working. There must be some nuance with your input text. – Tim Biegeleisen Sep 23 '22 at 06:19
I am reading my file from the google drive. Is there any issue in reading \n due to reading file from google drive ? – Roshni Hirani Sep 23 '22 at 08:16
On Windows, line endings are `\r\n`. Try using this split: `paragraphs = re.split(r'(?:\r?\n){2,}', text)` – Tim Biegeleisen Sep 23 '22 at 08:18

How to find a paragraph number in text using python?

2 Answers2