-19

What regex in PHP would strip all characters from any input string except digits and the first encountered decimal point, and truncate everything to two digits after the first encountered decimal point (if more than one is present).
The input string should be interpreted from left to right, so string such as $34.455r.4r45,45.45 should yield the following output 34.45.
Also, "any input string" means that strings with no dots, or no digits, and those having whatever other combination of characters are also possible as an input.

For example:

Input string                                Desired output
*$234.345                                   => 234.34
(9.9)                                       => 9.9
$34.455r.4r45,45.45                         => 34.45
2023-05-29 03:40:11Z, License: CC BY-SA 4.0 => 202305290340114.0
9.95,6.432,0.3                              => 9.95
po2iaw5e.ro1i7im8jjks;fl32;i.u12ma          => 25.17
oias25.spkks                                => 25.
4545                                        => 4545
oi.as2.5.6spkks                             => .25
oi.as..spkks                                => .

I tried this expression but it doesn't work

preg_replace('/^\d+(\.\d{0,2})?$/', '', $input_string);

I'm looking for a solution that is a single regular expression and replacement string to be used in one call to preg_replace().

Makyen
  • 31,849
  • 12
  • 86
  • 121
Jimski
  • 826
  • 8
  • 23
  • 2
    Might be easier to `preg_match('/\d+(\.\d{0,2})?/', $input_string, $matches)` e.g. https://3v4l.org/S5ZEf – Nick May 25 '23 at 02:54
  • Your regex said to find a number at the start of a string, then possibly a `.` with 0 to 2 integers after it, then the ending of the string. `[^\d.]` would answer the question asked. Possibly `number_format(preg_replace('/[^\d.]/', '', '$234.345'), 2);` but would need to fix rounding if you want exact number. – user3783243 May 25 '23 at 09:26
  • 8
    Please stop adding remarks into your question that don’t help clarifying it. Instead of stating that your question is not a duplicate, precisely explain why the proposed solutions don’t solve the issue, **and limit your edit to that**. – blackgreen May 27 '23 at 03:35
  • @blackgreen Why did you delete my explanation? How can I explain the problem if you are deleting my explanations? – Jimski May 29 '23 at 02:44
  • 5
    You are supposed to edit the explanation, and **only** the explanation **in the question**, not in comments. – blackgreen May 29 '23 at 02:45
  • 5
    Your question is two part, both being duplicates. Your other part is a duplicate of: https://stackoverflow.com/questions/9944001/delete-digits-after-two-decimal-points-without-rounding-the-value – Abdul Aziz Barkat May 29 '23 at 03:20
  • 7
    This question is [being discussed on Meta Stack Overflow](https://meta.stackoverflow.com/q/424882/3773011). – Makyen May 29 '23 at 04:33
  • Is that? https://regex101.com/r/R3tf24/1 – Augusto Vasques May 29 '23 at 04:35
  • 1
    It seems I forgot to include at least one case. What should the output be for "oias25.spkks" end up being? Effectively, this is asking "what should happen to a trailing "." when there isn't a digit to follow it. So, just knowing what the output should be for "25." as the input would provide the same information. – Makyen May 29 '23 at 06:00
  • 4
    @Jimski This sounds like an [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem), specially when you try to find numbers which a written across non digit characters (like letters?). What is the use-case you have? Wouldn't it be easier to focus on the actual problem you have? Maybe there is a much simpler solution to the problem you are actually having. – Progman May 29 '23 at 08:03
  • 2
    @Jimski Is it an option to use multiple `preg_replace()` lines like remove every non-digit and non-dot character first and then deal with the multiple dots and two decimals numbers behind the first one after? – Progman May 29 '23 at 08:14
  • @Progman Anything that can be performed by two preg_replace can also be performed by a single one accorting to rules of Turing machine. – Jimski May 29 '23 at 09:26
  • @Abdul Aziz Barkat Those are not separate parts but multiple contraints. If you artificially start separating contrains, as you are clearly doing, then you can extract digits, periods, nonnumeric characters, reading a string from left to right, and call it a question in four parts, however such an approach is logically defective. – Jimski May 29 '23 at 09:30
  • @Makyen I have included your string of "oias25.spkks" as example, however the question clearly states: "strip all characters from any input string except digits and the first encountered decimal point" so the the decimal point should be included in the output even if nothing follows it. – Jimski May 29 '23 at 09:38
  • 6
    @Jimski breaking the problem down into simpler problems is logically defective? Clearly it is difficult for you to solve this problem because you are trying to write one regex to do it all. Here's how I would approach it: 1) Replace all non-numeric (numbers plus dot) characters with empty string 2) Write a regex that captures numbers and then optionally a dot with upto 2 digits. This approach is 1) Simple to understand 2) Easy to find references for by searching 3) Not that computationally expensive. I don't know why you'd find this approach logically defective. – Abdul Aziz Barkat May 29 '23 at 09:48
  • @Abdul Aziz Barkat Breaking the problem into simpler parts for the sake on analysis is not defective but braking the question into parts and stating that they represent duplicate questions is defective because you can break this question into dozens of parts and combination of parts and state that they are duplicates of something. For example there is a phrase "strip all characters" accoring to your approach some question that ask to strips all characters would represent a duplicate. – Jimski May 29 '23 at 10:00
  • @Progman No, "any input string" is stipulated in the question which means that a string with no dots is permitted as an input. However I will add your comment to the body of the question. – Jimski May 29 '23 at 10:05
  • 1
    @Jimski Can the replacement string be changed or must it be the empty string `''` and you can only change the "regex" part of the `preg_replace()` call? – Progman May 29 '23 at 10:27
  • @Progman I'm not sure if I understand your question. If the empty string is changed to something else, which is non-numeric, then the output will be non-numeric. What do you suggest to replace the empty string with? – Jimski May 29 '23 at 10:32
  • 1
    @Jimski I have found a solution, but it requires that the "regex" argument **and** the "replacement" argument of the `preg_replace()` call is changed. When you can only change/provide the "regex" argument in your application/system, this solution would be useless. That's why I asked if the "replacement" parameter of `preg_replace()` can be changed or not. – Progman May 29 '23 at 10:35
  • @Progman Any regex that produces the desired output would represent a valid answer to this question. My example was just an attempt to solve it and not a constraint. – Jimski May 29 '23 at 10:41
  • 1
    Problem XY. Even though the issue can be solved with Regular Expressions, there will either be a solution impacted by high time complexity due to the use of complicated look ahead/look behind, or there will be a solution impacted by high code complexity due to the use of multiple conditional regex. Perhaps the simplest solution would be to implement a pushdown automaton. – Augusto Vasques May 29 '23 at 10:57
  • @Augusto Vasques You can't judge the answer before it has been rendered. So you have no way of knowing its complexity or making a statement that some other answer would have been better as you have no point of reference, other than some unfounded speculation. – Jimski May 29 '23 at 11:08
  • 3
    Don't judge other users' features based on your restrictions. – Augusto Vasques May 29 '23 at 11:47
  • 1
    @Jimski I added a couple of additional cases which are consistent with the words you have in the question and your insistence on them. Admittedly, I, personally, would have wanted the "oi.as..spkks" case to result in "", rather than the "." which your wording insists upon. – Makyen May 29 '23 at 14:40
  • 2
    I also added the constraint you've mentioned in comments that you want a *single* `preg_replace()` operation, even if that's not the most efficient method of doing this (or the best way to code this, as multiple steps would likely be simpler and easier to maintain). While the constraint may sound artificial, I have encountered situations in the real world where you can specify only the regex and replacement string, but not the operation. However, for most of those that I've encountered you could specify a sequence of such operations, but I could see a situation where that's not the case. – Makyen May 29 '23 at 14:41
  • 2
    I suspect that you should also specify the version of PHP you're using, as the capabilities of the PHP regular expression engine have likely evolved over time, but it's not unreasonable for answers to need to specify a minimum PHP version under which they are operable, given that you don't specify it in your question. – Makyen May 29 '23 at 14:45
  • @Makyen After looking at the output from "oi.as..spkks" I agree that the "" would probably be more universal for application by other potential users of this solution. – Jimski May 29 '23 at 19:06

1 Answers1

1

You can use the following preg_replace() call to replace the input with the number you are looking for:

preg_replace('~[^\d.]+|(\.)\D*(\d?)\D*(\d?).*~', '$1$2$3', $input_string);

The regex replacement works as follow:

  • The first part [^\d.]+ is trivial and deletes everything that is not a digit or a dot with the captured groups '$1$2$3'. But since you are not catching anything in that part of the regex, these capture groups are empty. So every character that is not a digit or a dot get replaced with an empty string.

  • The second part (\.)\D*(\d?)\D*(\d?).* is a little bit more complicated. It has three capture groups:

    1. (\.) - An ordinary . character.
    2. (\d?) - An optional digit.
    3. (\d?) - Another optional digit.

    The remaining parts between these capture groups are for characters, which are not digits. Also the .* at the end matches the remaining of the content when you found your first dot character, so you do this replacement only once and with the whole remaining input.

Check the following test PHP script:

<?php
$cases = array();
$cases['*$234.345'] = '234.34';
$cases['(9.9)'] = '9.9';
$cases['$34.455r.4r45,45.45'] = '34.45';
$cases['2023-05-29 03:40:11Z, License: CC BY-SA 4.0'] = '202305290340114.0';
$cases['9.95,6.432,0.3'] = '9.95';
$cases['po2iaw5e.ro1i7im8jjks;fl32;i.u12ma'] = '25.17';
$cases['oias25.spkks'] = '25.';
$cases['4545'] = '4545';
$cases['oi.as2.5.6spkks'] = '.25';
$cases['oi.as..spkks'] = '.';

$regex = '~[^\d.]+|(\.)\D*(\d?)\D*(\d?).*~';
$replacement = '$1$2$3';
foreach ($cases as $input => $expected)
{
    $result = preg_replace($regex, $replacement, $input);
    $correct = $result == $expected;
    echo "Input: ".$input.", result: ".$result.", as expected: ".($correct ? 'true' : 'FALSE')."\n"."<br>";
    if (!$correct)
    {
        echo "\tShould be: ".$expected."\n";
    }
}

This will generate the following output:

Input: *$234.345, result: 234.34, as expected: true
Input: (9.9), result: 9.9, as expected: true
Input: $34.455r.4r45,45.45, result: 34.45, as expected: true
Input: 2023-05-29 03:40:11Z, License: CC BY-SA 4.0, result: 202305290340114.0, as expected: true
Input: 9.95,6.432,0.3, result: 9.95, as expected: true
Input: po2iaw5e.ro1i7im8jjks;fl32;i.u12ma, result: 25.17, as expected: true
Input: oias25.spkks, result: 25., as expected: true
Input: 4545, result: 4545, as expected: true
Input: oi.as2.5.6spkks, result: .25, as expected: true
Input: oi.as..spkks, result: ., as expected: true
Progman
  • 16,827
  • 6
  • 33
  • 48
  • 1
    @prog I recommend one or more non-digit/dots at the beginning and using `\D` later on: `~[^\d.]+|(\.)\D*(\d?)\D*(\d?).*~` – mickmackusa May 30 '23 at 05:43
  • 2
    @mickmackusa using `+` at the beginning makes sense. Also, I used to have `[^\d.]*` to exclude the dot as well, but then I notice that it is "allowed" as well, so I changed it to `[^\d]*`, without simplify it to `\D*`. I have adjusted the regex. – Progman May 30 '23 at 08:05