If I want to find all periods that ARE at the end of paragraphs, I could do \.($|\n)
. But how can I negate that and say "a period followed by any character that ISN'T one of these, given that metacharacters don't work inside character classes, which stops me using negated character classes?

- 35,956
- 47
- 141
- 220
-
@Braj My thoughts exactly. `$` means three different things in Java, {C#,PHP,Python}, and {JS,Ruby} :) – zx81 Jun 28 '14 at 21:06
-
You’re going to need double-newlines for paragraph terminators, so something like `\n{2,}` where you have `\n` is probably best. However, a lone newline itself is often insufficient to indicate a paragraph separator (for example, in markdown or email), and it may be insufficient, too (as in HTML). – tchrist Jun 29 '14 at 14:39
-
@zx81 More than 3, actually. – tchrist Jun 29 '14 at 15:12
4 Answers
What's in a $
? It depends!
The answer very much depends on which language and regex engine you're using. You see,
- In Java, the
$
asserts that we are positioned at the end of the string or before any carriage return or newline at the end of the string. So you'd be safe with a\.(?!$)
- In PCRE, C# and Python, the
$
asserts that we are positioned at the end of the string or before any newline at the end of the string. So you'd could use a\.(?!$|\r)
- In JavaScript and Ruby, the
$
asserts that we are positioned at the end of the string. So you'd need to go the full Monty with a\.(?!$|[\r\n])
.
Therefore, for a multi-engine solution, the safest would be:
\.(?!$|[\r\n])
But in the right context, the other two options are perfectly acceptable.
Explanation
\.
matches the literal period- The negative lookahead
(?!$|[\r\n])
asserts that what follows is neither the "end of the string" nor a carriage return nor a newline.

- 41,100
- 9
- 89
- 105
-
There’s more than one good reason UTS #18 strongly recommends that `\R` be part of a regex language. See my answer. – tchrist Jun 29 '14 at 15:10
-
I was going with the sort of standard "end of string" match. – temporary_user_name Jun 29 '14 at 16:31
-
`I was going with the sort of standard "end of string"` Yes, that's exactly what I understood. The point of this answer is that you say it differently in different languages, and it gives you options in these various languages. For instance, in JS, the answer you picked will not work! On a Windows file, it **will** match the period at the end of a string. – zx81 Jun 29 '14 at 22:27
Use a Negative Lookahead to do this.
\.(?!\n|$)
Explanation:
\. '.'
(?! look ahead to see if there is not:
\n '\n' (newline)
| OR
$ before an optional \n, and the end of the string
) end of look-ahead

- 69,796
- 4
- 95
- 132
-
1Just use `(?x)` and write the thing with comments right from the get-go the way it should be done. – tchrist Jun 29 '14 at 14:41
The most useful longhand version of the negatively looked ahead EOL check after the period winds up making your entire pattern something like this:
(?x: # enable comments
\. # a literal dot character
(?! # look ahead for not the following{
\R ? # optional EOL grapheme cluster
\z # at the true end of string
) # } end look ahead
)
That assumes you don’t want it match “interstitially” (that is, before any line-terminator grapheme), which would be the simpler:
(?=\R)
Some argument can be made for that \R?
being made into a \R*
instead, in case you should happen to have multiple line-terminators at the end of a record, like several newlines in a row. That way 0, 1, 2, or however many EOL graphemes are allowed before the end of the string.
On the other hand, it may well be the case that a paragraph must be at least two EOL graphemes, not just one alone. For example, this is true in markup here and in other files with “blank-line separated” types of paragraphs. So no EOLs are ok, and two or more are too, but not just one of them.
For such text, you would need \R{2,}
, but the whole bit would be optionalized, yielding in that case:
(?x: # enable comments
\. # a literal dot character
(?! # look ahead for NOT the following {
(?:
\R {2,} # two or more EOL grapheme clusters
) ? # # optionally
\z # at the true end of string
) # } end negated look ahead
)
If you don’t have \R
from UTS 18: Unicode Regular Expressions — Line Boundaries in your regex flavor, then you will have to write it out the hard way, which is the rather annoying:
(?x: # We are emulating \R per UTS#18
(?> # Prohibit backtrack within subpattern
\r \n # Match a CRLF without backtracking
# or else any code point with the
# vertical space character property
# \p{VertSpace}, here enumerated in full
| [\x0A-\x0D\x85\x{2028}\x{2029}]
)
)
You need the no-backtracking bit to avoid something like \R{2}
being allowed to match a single CRLF, and it isn’t allowed to do that.
One final thing to consider is whether you want to allow for optional horizontal whitespace to intervene between the period and the EOL. I rather imagine that you do, but without a tighter formal specification in the OP, it’s impossible to say so definitely.

- 78,834
- 30
- 123
- 180
-
1Thanks for drawing me to your answer. :) I didn't want to mention `\R` as I feared my answer was already a bit complex (and indeed the OP picked a solution that **would** match periods at the end of a string on JS in Windows...) But your discussion is absolutely masterful, as other writings of yours I have been chanced upon. From now on I might have to seek them out. :) In fact by pure coincidence, just this morning I added one of your answers to the `Answers I loved reading` section of my profile! +1, sorry I can't +5 :) – zx81 Jun 29 '14 at 22:32
You should use a negative lookahead.
\.(?!$|\n)
More on this: http://www.regular-expressions.info/lookaround.html

- 10,143
- 2
- 25
- 44