0

my string may be like this:

@ *lorem.jpg,,, ip sum.jpg,dolor ..jpg,-/ ?

in fact - it is a dirty csv string - having names of jpg images

I need to remove any non-alphanum chars - from both sides of the string
then - inside the resulting string - remove the same - except commas and dots
then - remove duplicates commas and dots - if any - replace them with single ones

so the final result should be:
lorem.jpg,ipsum.jpg,dolor.jpg

I firstly tried to remove any white space - anywhere

$str = str_replace(" ", "", $str);  

then I used various forms of trim functions - but it is tedious and a lot of code

the additional problem is - duplicates commas and dots may have one or more instances - for example - .. or ,,,,

is there a way to solve this using regex, pls ?

provance
  • 877
  • 6
  • 10
  • Is this helpful : https://stackoverflow.com/questions/659025/how-to-remove-non-alphanumeric-characters – SelVazi Jan 17 '23 at 10:16
  • Once you removed the spaces, the regular expression `(\w+\.\w+)` should be enough to extract all the file names using preg_match_all. You can then use implode to join those results with a comma between them. – CBroe Jan 17 '23 at 10:16
  • @CBroe - interesting, thanks, I will try. But I suppose duplicates commas and dots are still the problem – provance Jan 17 '23 at 10:20
  • Can you try this $result = preg_replace("/[^A-Za-z0-9,.]/", '', $str); – SelVazi Jan 17 '23 at 10:22
  • @SelVazi - it works except last comma - but I can remove it by `rtrim`. But it does not remove duplicates commas and dots – provance Jan 17 '23 at 10:28

3 Answers3

2

List of modeled steps following your words:

Step 1

  • "remove any non-alphanum chars from both sides of the string"

  • translated: remove trailing and tailing consecutive [^a-zA-Z0-9] characters

  • regex: replace ^[^a-zA-Z0-9]*(.*?)[^a-zA-Z0-9]*$ with $1

Step 2

  • "inside the resulting string - remove the same - except commas and dots"
  • translated: remove any [^a-zA-Z0-9.,]
  • regex: replace [^a-zA-Z0-9.,] with empty string

Step 3

  • "remove duplicates commas and dots - if any - replace them with single ones"
  • translated: replace consecutive [,.] as a single instance
  • regex: replace (\.{2,}) with .
  • regex: replace (,{2,}) with ,

PHP Demo:

https://onlinephp.io/c/512e1

<?php

$subject = " @ *lorem.jpg,,, ip sum.jpg,dolor ..jpg,-/ ?";

$firstStep = preg_replace('/^[^a-zA-Z0-9]*(.*?)[^a-zA-Z0-9]*$/', '$1', $subject);
$secondStep = preg_replace('/[^a-z,A-Z0-9.,]/', '', $firstStep);
$thirdStepA = preg_replace('(\.{2,})', '.', $secondStep);
$thirdStepB = preg_replace('(,{2,})', ',', $thirdStepA);

echo $thirdStepB; //lorem.jpg,ipsum.jpg,dolor.jpg
Diego D
  • 6,156
  • 2
  • 17
  • 30
1

Can you try this :

$string = ' @ *lorem.jpg,,,,  ip sum.jpg,dolor .jpg,-/ ?';
// this will left only alphanumirics
$result = preg_replace("/[^A-Za-z0-9,.]/", '', $string);

// this will remove duplicated dot and ,
$result = preg_replace('/,+/', ',', $result);
$result = preg_replace('/\.+/', '.', $result);

// this will remove ,;. and space from the end
$result = preg_replace("/[ ,;.]*$/", '', $result);
SelVazi
  • 10,028
  • 2
  • 13
  • 29
1

Look at

https://www.php.net/manual/en/function.preg-replace.php

It replace anything inside a string based on pattern. \s represent all space char, but care of NBSP (non breakable space, \h match it )

Exemple 4

$str = preg_replace('/\s\s+/', '', $str);

It will be something like that

ThomasL
  • 749
  • 4
  • 12