C# Regex - Match certain char followed by number/identifier

Question

I'm in trouble with a Regex which seems to have never been asked here. I have to replace the char a followed by a whitespace (or not followed), but necessarly followed by a number (the number must not be replaced).

I have this Regex: [aA]\s.(?<=\d)* and this is the result:

using (?<=\d)* I wanted to try to match but not capture the number immediately after the character following (or not) from the space, but obviously it doesn't work, also because "\d" does not include the identifiers. Identifiers can be a series of numeric or alphanumeric characters without a defined length, nor a sorting of the letters in case it was alphanumeric. They can be A54N3, Z4G78 or 8454 or 4AZ7 or 7 or A1, 1A. Combinations always change.

I'd want to match ONLY the a before the number 8(or any other number, or an identifier like N574A) and replace that char with art, but leaving the number /identifier as it is, so result should be: agricoltura n 6 sensi dell'art8 or agricoltura n 6 sensi dell'artN574A, and if the target string was agricoltura n 6 sensi dell'a8 or agricoltura n 6 sensi dell'aN574A, (so without whitespace) result should be: agricoltura n 6 sensi dell'art8 or agricoltura n 6 sensi dell'artN574A

So the generic rule should be: Match [aA] followed by an optional space then must be followed by a number or an identifier that must not be captured

Is it possible to do such a thing? What could be the solution? Thank you so much!

UPDATE

Using the \\b([aA])\\s*([A-Za-z]*\\d[\\dA-Za-z]*)\\b pattern seems to replace correct values, here is the demo

Capture what you need to keep and use `${x}` where `x` is the Group ID (1-based) — Wiktor Stribiżew, Nov 06 '19 at 10:49
sorry @WiktorStribiżew, I'm sure it's my fault, but honestly I don't see how that post can match with my question — Matteo Pietro Peru, Nov 06 '19 at 11:02
What you need to keep is wrapped with a capturing group, the rest is just matched. Well, no idea how else we can help you since your rules are too vague. You say it must be followed with a number and then you say it can be an identifier. You might probably try `Regex.Replace(text, @"([aA])\s*([A-Z]*\d[\dA-Z]*\b)", "$1rt$2")` (see [**demo**](http://regexstorm.net/tester?p=%28%5baA%5d%29%5cs*%28%5bA-Z%5d*%5cd%5b%5cdA-Z%5d*%5cb%29&i=agricoltura+n+6+sensi+dell%27a+8%0d%0aagricoltura+n+6+sensi+dell%27a8%0d%0aagricoltura+n+6+sensi+dell%27a+N574A&r=%241rt%242)), but you should precise the rules. — Wiktor Stribiżew, Nov 06 '19 at 11:13
As you suggested i edited question pointing out immediately that the field that must be present but excluded from the match can be both a number and an identifier. So, the rule is: match [aA] followed by an optional space then must be followed by a number or an identifier that must not be captured. — Matteo Pietro Peru, Nov 06 '19 at 11:31
Ok, so, `.(?<=\W)*` was meant to match an identifier? It is the same as if you tried to match any char with `.`. And the number/ID can be captured, since you can always restore it in the result with a backreference. See the solution in my previous comment, does it help in any way? The problem with the question is still that you did not reveal the identifier pattern requirements. — Wiktor Stribiżew, Nov 06 '19 at 11:32
To make the question on topic, please explain why you used `.(?<=\W)*` and what the identifier pattern must be. Please also fix the title as I closed the post because you asked to replace something followed by something else, which is done with capturing groups/backreferences, or with lookarounds. — Wiktor Stribiżew, Nov 06 '19 at 11:38
I actually noticed that there were a few errors in the application, and I made the necessary changes (i hope). By the way, your pattern `([aA])\s*([A-Za-z]*\d[\dA-Za-z]*\b)` seems working. I still do other tests to confirm it — Matteo Pietro Peru, Nov 06 '19 at 12:16
So, that means an identifier is an alphanumeric string that contains at least 1 digit. — Wiktor Stribiżew, Nov 06 '19 at 12:22
If you confirm it and add to the question, I think the question will be answerable. — Wiktor Stribiżew, Nov 06 '19 at 12:55
@WiktorStribiżew, yes i confirm your pattern works, i just had to add `\b` on `(\b[aA])` to not consider the case in which the character is the last letter of a word followed by the space — Matteo Pietro Peru, Nov 06 '19 at 14:01

score 1 · Accepted Answer · answered Nov 06 '19 at 15:01

1

You may use

\b([aA])\s*([A-Za-z]*\d[\dA-Za-z]*)\b

Replace with $1rt$2. See the regex demo

Details

\b - a word boundary
([aA]) - Group 1 (referred to with $1 from the replacement pattern): a or A
\s* - 0 or more whitespaces
([A-Za-z]*\d[\dA-Za-z]*) - Group 2 (referred to with $2 from the replacement pattern): an alphanumeric whole word that contains at least one digit:
- [A-Za-z]* - zero or more ASCII letters
- \d - a digit
- [\dA-Za-z]* - 0+ digits or ASCII letters (replace \d with 0-9 to match ASCII digits only, or pass RegexOptions.ECMAScript flag to Regex constructor)
\b - word boundary.

answered Nov 06 '19 at 15:01

Wiktor Stribiżew

607,720
39
448
563

This pattern works good! In my case, the word boundary works even better inside parentheses, especially the first one: `(\\b[aA])\\.\\s*([A-Za-z]*\\d[\\dA-Za-z]*\\b)` – Matteo Pietro Peru Nov 06 '19 at 15:14
1

@MatteoPietroPeru It is common (best) practice to keep `\b` outside of capturing groups for better performance. – Wiktor Stribiżew Nov 06 '19 at 15:15
I adjust my previous comment: word boundary works correctly also outside parentheses! – Matteo Pietro Peru Nov 06 '19 at 15:32

C# Regex - Match certain char followed by number/identifier

UPDATE

1 Answers1