I'm trying to get the first sentence inside a <p>
tag. I consider that a sentence ends with the first "final" dot, i.e when it goes "dot space uppercase", to skip abbreviations.
echo "<p>this will def. fail. So. Sad.</p>" | sed -r -e "s/<p>(([^\.]*\. [^A-Z])*[^\.]*\.) [A-Z]/\1/g"
The expected result is this will def. fail.
, which I try to capture with \1
It works on regex101 but returns this will def. fail.o. Sad.</p>
when used with sed on my terminal.
this will def. fail. So. Sad.
" | sed -r -e "s/^([^.]*[.]).*$/\\1/g"`. Or `sed -r -e "s/^
([^.]*[.])( .*$|$)/\\1/g"` if there must be a space after the first dot.
– Wiktor Stribiżew Dec 11 '15 at 13:16