-4

I am not experienced in patterns, so I would ask you not to remove the post. I try to precisely determine what my problem is and ask for help in finding sentences, according to my formula.

Step1:

Search sentences that allowed at end .!? Example sentence: ! TEXT

Gdy patrzę na świat, to jest tak piękne i straszne w tym samym czasie!

-or- . TEXT

Gdy patrzę na świat, to jest tak piękne i straszne w tym samym czasie.

-or- ? TEXT

Gdy patrzę na świat, to jest tak piękne i straszne w tym samym czasie?

Step2: Search sentences NOT allowed at end . The beginning of the Line:

0.

(ANY NUMBER + DOT)

5.

(ANY NUMBER + DOT)

156.

(ANY NUMBER + DOT) Only at the beginning of the line, everywhere else is acceptable.

Step3: All languages of the world are allowed, except for Russian.

Step4: Add a search exception for any links (URLs). Completely ignore.

Step5:

Allow sentence detection when another sentence ends with "three dots", "three exclamation marks", "three question marks" and the next begins with a capital letter: Example: TEXT

Jestem w innym świecie... W świecie o innej kulturze, języku, tradycjach, architekturze, przyrodzie, kuchni, pogodzie.

TEXT

Jestem w innym świecie!!! W świecie o innej kulturze, języku, tradycjach, >architekturze, przyrodzie, kuchni, pogodzie.

TEXT

Jestem w innym świecie??? W świecie o innej kulturze, języku, tradycjach, architekturze, przyrodzie, kuchni, pogodzie.

Nero
  • 7
  • 2
  • 3
    If you're not experienced, it's easier to start from the simpler tasks. You asked at least five different questions in one, some of them really complex. – choroba May 05 '19 at 20:27
  • A few years ago someone wrote me a regular expression, but unfortunately I do not have this pattern in my collection anymore. Reportedly, sometimes miracles can be done with a regular expression, but it is beyond my possibilities. – Nero May 05 '19 at 20:35
  • 1
    Questions that ask ["Give me a regex that does X"](//meta.stackoverflow.com/q/285733) with no attempt are off topic on Stack Overflow. – Patrick Artner May 05 '19 at 20:38
  • 2
    you can coipy your data into an online regex tool (f.e. http://regex101.com ) and play around until you get a regex that matches what you need. – Patrick Artner May 05 '19 at 20:39
  • I've tested a lot of different regular expressions for a long time. It generated too many errors. I can try endlessly and waste a lot of time in vain, but nothing good will come of it. I asked a few people, but it's just as complicated for other users, just like for me, so refuse further help. Currently, I can calculate the number of sentences in simple sentences. Currently, the problem is numbers (with a dot) and links (url) that are incorrectly identified as sentences. – Nero May 05 '19 at 20:51
  • How can I explain how it is not possible to color the text and indicate which one would identify the first sentence and each subsequent one?"31. Jemu wydawało się, jakby to ona była bez skazy. Ale miała jedną wadę - szukała wad w sobie... Przez 24/7. 63 lat temu była kiedyś asteroida." There should be 4 Sentences. Regex Enabled & Enabled Count Matches............ Regex 1 Count matches Found (Wrong!) – Nero May 06 '19 at 08:15
  • Here the sentences are colored. See: https://postimg.cc/34VzkHsF – Nero May 06 '19 at 08:26

1 Answers1

0
^(?!\d+\.).*[.!?]$

This fulfills most of the requirements.

Matches:

Gdy patrzę na świat, to jest tak piękne i straszne w tym samym czasie!

Doesn't match:

156. abc.

Step 3 and 4 are impossible to test for in regex.

MakotoE
  • 1,814
  • 1
  • 20
  • 39
  • Fantastic, thank you. Regex ignores URLs (very good). Regex finds the sentences (very good). Only still remains to solve the issues of numbering some sentences (only the beginning of the line) Screenshot text: https://postimg.cc/TywFXf2z – Nero May 05 '19 at 23:32
  • @Nero I'm not sure what you mean. Do you want to match numbered sentences, or not match numbered sentences? Or did you want to match the sentence but not the number and dot? – MakotoE May 06 '19 at 00:25
  • It should be: Count Match Sentences: 4 Of course, this is a special numbering in the text at the beginning of the new line - my authorship, to separate (indicate the separation) some texts in artulas, some prose, essays, to unnecessarily add empty lines or other separators. Of course, there is no numbering everywhere, but many places in the text - but it must exist. In this particular case (only on a line begining), consider one sentence. EXAMPLE 31. Jemu wydawało się, jakby to ona była bez skazy. Ale miała jedną wadę - szukała wad w sobie... Przez 24/7. 63 lat temu była kiedyś asteroida. – Nero May 06 '19 at 01:38
  • @Nero Sorry I still do not understand you. But maybe this works? `[^\d\.\s].*[.!?]$` – MakotoE May 06 '19 at 04:18
  • @Nero Ah okay, your screenshot in the other comment helps a lot. This will match the selections in that screenshot: `(^\d+\..*?|.*?)(\.\.\.|[\.?!])`. Basically, a sentence ends with a ".?!" OR a sentence begins a line with a number + "." and ends with ".?!" – MakotoE May 06 '19 at 21:56
  • @Nero This works better: `(^\d+\..*?|[^\s].*?)(\.\.\.|[\.?!])` to ignore spaces before the selection. – MakotoE May 06 '19 at 22:12
  • The sentence ends with a dot, exclamation mark or question mark. Yes, now it's almost okay, but you've forgotten at the beginning of my question that it has to ignore characters in URLs. See new screenshot: https://postimg.cc/hzzVcNh2 – Nero May 07 '19 at 09:42
  • @Nero Ok, how about this: `(\S+\.(com|net|org|edu|gov)(\/\S+)?)|((^\d+\..*?|[^\s].*?)(\.\.\.|[\.?!]))` I copied it from https://stackoverflow.com/questions/1141848/regex-to-match-url. It searches for URLs first, then matches sentences. – MakotoE May 08 '19 at 00:09
  • I've tested. Unfortunately, this pattern is incorrect because it matches the links as sentences. – Nero May 08 '19 at 02:43