7

I want to find digits followed by "f", "ff", "f." or "ff." to standardize the spelling following given conventions/rules.

I already tried some regular expressions, but unfortunately I did not find an universal expression grabbing all of the cases above (f, ff, f., ff.).

In spoken words it seems easy:

  • find digits
  • followed by an optional whitespace
  • then followed by f, ff, f. or ff.
  • only whitespaces or NOT word boundaries are allowed before and after the expression

The beginning of the regex is quite easy, but I can’t figure out how to handle the different "f"-cases and the NOT boundaries following.


My best guess yet is:

(?<=\b)(\d+(\h|\b)?f{1,2})\.?

but then still the stings followed by a word character are found.


When I extend the regex to:

(?<=\b)(\d+(\h|\b)?f{1,2})\.?(\W)

the numbered of "false funds" are decreasing, but still it is not the solution


I prepared lines for testing. The lines containing a plus "+" should be found, at the same time the ones with a minus "-" should not be found.

00f aaa +
00f. aaa +
00ff aaa +
00ff. aaa +
00 f aaa + 
00 f. aaa +
00 ff aaa +
00 ff. aaa +
+ aaa 00f aaa +
+ aaa 00f. aaa +
+ aaa 00ff aaa +
+ aaa 00ff. aaa +
+ aaa 00 f aaa + 
+ aaa 00 f. aaa +
+ aaa 00 ff aaa +
+ aaa 00 ff. aaa +
+ aaa 00f
+ aaa 00f.
+ aaa 00ff
+ aaa 00ff.
+ aaa 00 f 
+ aaa 00 f.
+ aaa 00 ff
+ aaa 00 ff.

00 faaa -
00 f.aaa -
00 ffaaa -
00 ff.aaa -
00af aaa - 
00af. aaa -
00aff aaa -
00aff. aaa -
- aaa 00 faaa -
- aaa 00 f.aaa -
- aaa 00 ffaaa -
- aaa 00 ff.aaa -
- aaa 00af aaa - 
- aaa 00af. aaa -
- aaa 00aff aaa -
- aaa 00aff. aaa -
- aaa00f
- aaa00f.
- aaa00ff
- aaa00ff.
- aaa 00af 
- aaa 00af.
- aaa 00aff
- aaa 00aff.

00faaa -
00f.aaa -
00ffaaa -
00ff.aaa -
00af aaa - 
00af. aaa -
00aff aaa -
00aff. aaa -
- aaa00 faaa -
- aaa00 f.aaa -
- aaa00 ffaaa -
- aaa00 ff.aaa -
- aaa00af aaa - 
- aaa00af. aaa -
- aaa00aff aaa -
- aaa00aff. aaa -
- aaa00af 
- aaa00af.
- aaa00aff
- aaa00aff.

Further, the aim is to group the digits anf "f"-cases in a manner, so that they can be uses in a replacement-expression to standardize the spelling to one of those cases:

  • 123 ff. (with whitespace, with dot)
  • 123 ff (with whitespace, without dot)
  • 123ff. (without whitespace, with dot)
  • 123ff (without whitespace, without dot)
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
typopaul
  • 81
  • 4

2 Answers2

6

I suggest

\b(\d+)(\s?)(f{1,2})(?:(\.)\B|\b(?!\.))

See the regex demo

Details

  • \b - word boundary
  • (\d+) - Group 1: 1+ digits
  • (\s?) - Group 2: an optional whitespace
  • (f{1,2}) - Group 3: 1 or 2 fs
  • (?:(\.)\B|\b(?!\.)) - either of the two:
    • (\.)\B - a . captured in Group 4 if not followed with a word char
    • | - or
    • \b(?!\.) - a word boundary not followed with a dot.

Then, replacing is easy with:

  • 123 ff.: $1 $3.
  • 123 ff : $1 $3
  • 123ff. : $1$3.
  • 123ff : $1$3

If the whitespace and dot are not necessary in replacement patterns, remove the groupings and adjust the IDs in the replacement backreferences.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Hi, whilst looking at my own solution and checking yours, e.g. 00 f.!aaa aaa + would still match, but this doesn't adhere to requirements, since only whitespace/boundry at the end is allowed...? – vs97 Aug 20 '19 at 21:49
  • @vs97 No idea why you think so. `ff.` is followed with a non-word boundary and it *does* meet the requirements. – Wiktor Stribiżew Aug 20 '19 at 22:35
  • @WiktorStribiżew - Interestingly, running [your regexp](https://regex101.com/r/4T8EZI/3) via Adobe InDesign's _Find/Change_ feature yields the exact same matches as [your regexp with the flavor set to ECMAScript](https://regex101.com/r/4T8EZI/4), i.e. several wanted matches are not found. – RobC Aug 21 '19 at 08:35
  • 2
    @RobC I wrote: *`(\h?)` - Group 2: an optional horizontal whitespace **(use `\s` if any whitespace is allowed here)**. [My `\b(\d+)(\s?)(f{1,2})(?:(\.)\B|\b(?!\.))` regex works](https://regex101.com/r/4T8EZI/5). Edited to get rid of `\h`. – Wiktor Stribiżew Aug 21 '19 at 08:37
  • @WiktorStribiżew - Yes using `\s` instead does rectify the issue. – RobC Aug 21 '19 at 08:42
4

What about something like this?

\b\d+\s?(?:ff|f)+\.?(?=\s)

enter image description here

Regex Demo

\b          start with word boundary
\d+         match all digits
\s?         match optional whitespace
(?:ff|f)+   non-capturing group, match either ff or f
\.?         match optional dot (basically checking for ff. or ff or f. or f)
(?=\s)      match if followed by whitespace, without making the whitespace part of the match

With groups, same expression looks like:

\b(\d+)\s?((?:ff|f)+\.?)(?=\s)

enter image description here

Regex Demo

Replacement can be achieved via different combinations of the $1 and $2 groups.

vs97
  • 5,765
  • 3
  • 28
  • 41