2

I have this regex that matches strings that I want to check on validity. However recently I want to use this same regex to replace every character that is not valid to the regex with a character (let's say x).

My regex to match these types of strings is: '#^[\pL\'\’\d][\pL\.\-\ \'\/\,\’\d]*$#iu' Which allows for the first character to be of any language or any digit and some determined special chars. And all the following letters to be slightly the same but slightly more special characters.

This is what I do (nothing special).

    preg_replace($regex, 'x', $string);

Things I tried include trying to negate the regex: '(?![\pL\'\’\d][\pL\.\-\ \'\/\,\’\d]*)' '[^\pL\'\’\d][^\pL\.\-\ \'\/\,\’\d]*'

I've also tried splitting up the string into the firstchar and the rest of the string and split the regex in 2.

$validationRegex1 = '[^\pL\'\’\d]';
$validationRegex2 = '[^\pL\.\-\ \'\/\,\’\d]*';
$fixedStr1 = (string) preg_replace($validationRegex1, 'x', $firstChar)
    . (string) preg_replace($validationRegex2, 'x', $theRest);

But this also did not seemed to work.

I've experimented a bit with this online tool: https://www.functions-online.com/preg_replace.html

Does anyone know what I am overlooking?


Examples of strings and their expected results

'-' should become 'x'.
'Random-morestuff' stays 'Random-morestuff'
'Random%morestuff' should become 'Randomxmorestuff'
'Rândôm' stays 'Rândôm'
Totumus Maximus
  • 7,543
  • 6
  • 45
  • 69
  • 1
    Can you provide some example string inputs and the desired result vs the actual result? – SierraKomodo Jun 04 '21 at 07:54
  • @SierraKomodo provided. – Totumus Maximus Jun 04 '21 at 07:58
  • 1
    If the first character needs to get replaced - does the _next_ character then become the new first, that needs to have the exact same logic applied to it again? Or is this special treatment strictly tied to position/index 0? – CBroe Jun 04 '21 at 07:59
  • Aren't you overcomplicating things? Aren't you really looking to replace special characters within "words"? Then lookarounds might be more suited. – Jan Jun 04 '21 at 08:05
  • This one's a bit beyond me, but I can recommend a different web app for testing and experimenting with regex, as it provides a breakdown of what your regex code is checking for and matches against: https://regex101.com/ - It might help. – SierraKomodo Jun 04 '21 at 08:10
  • If the firstchar is replaced by an x. The next char should still be the next char (as in index 1). Only the first character (index 0) needs the special treatment @CBroe. Am I overcomplicating things? Probably. Yes I am looking to replace special characters within words, but the first character needs special treatment. – Totumus Maximus Jun 04 '21 at 08:12
  • @SierraKomodo I've been using regex101.com to try and figure this out but I probably overlook something because I couldnt make sense of it. – Totumus Maximus Jun 04 '21 at 08:13

2 Answers2

2

Just an idea but if I got you right, you could use

(?(DEFINE)
    (?<first>[\pL\d'’])
    (?<other>[-\ \pL\d.'/,’])
)
\b(?&first)(?&other)+\b(*SKIP)(*FAIL)|.

This needs to be replaced by x. You do not have to escape everything in a character class, I changed this accordingly.
See a demo on regex101.com.


A bit more explanation: The (?(DEFINE)...) thingy lets you define subroutines that can be used afterwards and is just syntactic sugar in this case (maybe a bit showing off, really). As you have stated that other characters are allowed depending on theirs positions, I just called them first and other. The \b marks a word boundary, that is a boundary between \w (usually [a-zA-Z0-9_]) and \W (not \w). All of these "words" are allowed, so we let the engine "forget" what has been matched with the (*SKIP)(*FAIL) mechanism and match any other character on the right side of the alternation (|). See how (*SKIP)(*FAIL) works here on SO.

Jan
  • 42,290
  • 8
  • 54
  • 79
  • Incredible! This does look a lot like the solution I need. Can you elaborate a little bit more what this all means so I can try to make sense of this? – Totumus Maximus Jun 04 '21 at 08:17
  • @TotumusMaximus: Added some explanation. – Jan Jun 04 '21 at 08:27
  • Ok, thank you, +1. I am convinced this works now. But before I accept this as an answer. You said there is no need to escape everything. But my phpstorm begs to differ. Just escaping the special characters inbetween the brackets satisfies the compiler but still gives me the problem about 'preg_replace(): Delimiter must not be alphanumeric or backslash'. – Totumus Maximus Jun 04 '21 at 09:03
  • @TotumusMaximus: Well, I have never used phpstorm, really. PyCharm (also from JetBrains) does the same but believe it or not, not everything must be escaped within square brackets. About the delimiters: you could e.g. use `~` on both sides. – Jan Jun 04 '21 at 17:08
  • Ryszard Czech provided the last piece of the puzzle and wrote your answer in php. @Jan. Thanks again for your help. – Totumus Maximus Jun 07 '21 at 10:30
1

Use

$fixedStr1 = preg_replace('/[\p{L}\'\’\d][\p{L}\.\ \'\/\,\’\d-]*(*SKIP)(*FAIL)|./u', 'x', $input_string);

See regex proof.

Fail matches that match valid symbol words and replace every character appearing in other places.

Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
  • 1
    This definitely deserves a +1 too since it rewrites Jan's answer to the php variant I was looking for. Thank you @RyszardCzech – Totumus Maximus Jun 07 '21 at 10:29