Perl: Request for improvement my REGEX (match only with positive/negative integers/decimals and commas)

Question

Its hard to describe what I would do. So I show it on example.

My string:

my $string = q(
min_entry = -0.236, 0, 0.236 , 0.382, 0.500, 0.618, 0.764
max_entry=0.236, 0.382, 0.500, 0.618, 0.764, 1.000
#jakis komentarz
rsi_confirm= 25,27,30, 32
slope3 = 0.236, 0.382, 0.5, 0.764
min_tp=0.0125 , 0.0236, 0.0382, 0.05, 0.0764, 0.1
interval = 14

[thresholds]
low = 40
high = 40
persistence = 9

My match pattern:

my @match = $string =~ /(([\d-\.]+[, ]+)+[\d-\.]+)/sg;
print Dumper \@match;

My results:

$VAR1 = [
          '-0.236, 0, 0.236 , 0.382, 0.500, 0.618, 0.764',
          '0.618, ',
          '0.236, 0.382, 0.500, 0.618, 0.764, 1.000',
          '0.764, ',
          '25,27,30, 32',
          '30, ',
          '0.236, 0.382, 0.5, 0.764',
          '0.5, ',
          '0.0125 , 0.0236, 0.0382, 0.05, 0.0764, 0.1',
          '0.0764, '
        ];

I dont know why or how elemens with index 1( value '0.618, ',), 3 (value '0.764, ',), 5, 7, 9 are added with my regex. But I dont need it.

Result I would like to achieve:

print Dumper \@match;
$VAR1 = [
          '-0.236, 0, 0.236 , 0.382, 0.500, 0.618, 0.764',
          '0.236, 0.382, 0.500, 0.618, 0.764, 1.000',
          '25,27,30, 32',
          '0.236, 0.382, 0.5, 0.764',
          '0.0125 , 0.0236, 0.0382, 0.05, 0.0764, 0.1',
        ]

Answer please base on my regex. The only repeating string identifying characters are "=" or "= " (before pattern) and "," (in the middle of the pattern)

I dont know... :D Propably to avoid matching strings without ',' in line. I dont need two groups. — Abdul Ahmed, Apr 27 '18 at 08:50
See [How do I determine whether a scalar is a number/whole/integer/float?](https://perldoc.perl.org/perlfaq4.html#How-do-I-determine-whether-a-scalar-is-a-number%2fwhole%2finteger%2ffloat%3f) and [Regexp::Common::number](https://metacpan.org/pod/Regexp::Common::number). — Sinan Ünür, Apr 27 '18 at 08:56
Based on your data (i'm guessing you dont want commas saved), i would do "my @match = $string =~ /[\d.-]+/sg;"however, you'll need to remove the unwanted lines from $string before you start (which you seem to have done but not show) . — hoffmeister, Apr 27 '18 at 09:33
I need commas to create array of array in next step. ` foreach (@match) { $_ =~ s/ //g; my @dupa = split /,/, $_; push @aoa, [@dupa]; }` — Abdul Ahmed, Apr 27 '18 at 10:00

score 1 · Answer 1 · answered Apr 27 '18 at 08:53

1

You have two parentheses groups, one inside the other. The inner one is yielding every second result. You should use a non-capturing group for the inner grouping.

answered Apr 27 '18 at 08:53

jjmerelo

22,578
8
40
86

score 1 · Accepted Answer · answered Apr 27 '18 at 08:54

1

Rather than using capture groups, you want to use clustering to group those parts of your regex together. Clustering is done by doing (?:whatever) rather than (whatever) so your code would become...

my @match = $string =~ /(?:(?:[\d-\.]+[, ]+)+[\d-\.]+)/sg;

answered Apr 27 '18 at 08:54

Chris Turner

8,082
1
14
18

Its not matching first line values in my perl script:`$VAR1 = [ '0.236, 0.382, 0.500, 0.618, 0.764, 1.000', '25,27,30, 32', '0.236, 0.382, 0.5, 0.764', '0.0125 , 0.0236, 0.0382, 0.05, 0.0764, 0.1' ]; ` – Abdul Ahmed Apr 27 '18 at 09:38
Are you sure the string is as you have it in the question as I get all 5 blocks of numbers when I use that regex. The only real difference should be the inner capture group won't be appear in `@match` – Chris Turner Apr 27 '18 at 09:59
Okay. I had ` if ($string =~ /(?:(?:[\d\.-]+[, ]+)+[\-\d-\.]+)/sg)` before matching. Its reason. Thanks for answer. – Abdul Ahmed Apr 27 '18 at 10:07

Borodin · Answer 3 · 2018-04-27T16:24:56.990

At a guess, this string is the contents of a file that you have read in its entirety to make things "easier". Unfortunately it means that you must explicitly cater for newline characters, which complicates things a lot

Here's an example of what I would do using the DATA file handle. Buliding @aoa is reduced to a single statement. Of course you may open a file and use the handle from that instead

Mistakes in your code have caused lines with only a single number (and no comma) to be ignored. It's possible that you need that behaviour, but I have "fixed" it here

use strict;
use warnings 'all';

my @aoa = map { /-?\d+(?:\.\d+)?/g } <DATA>;

use Data::Dumper;
print Dumper \@aoa;

__DATA__
min_entry = -0.236, 0, 0.236 , 0.382, 0.500, 0.618, 0.764
max_entry=0.236, 0.382, 0.500, 0.618, 0.764, 1.000
#jakis komentarz
rsi_confirm= 25,27,30, 32
slope3 = 0.236, 0.382, 0.5, 0.764
min_tp=0.0125 , 0.0236, 0.0382, 0.05, 0.0764, 0.1
interval = 14

[thresholds]
low = 40
high = 40
persistence = 9

output

I also suspect that even this is not the best solution to your problem as you are discarding all the data labels so you have no idea which list of numbers belongs to which category except by position

This alternative builds a hash of arrays so that the value is retained

use strict;
use warnings 'all';

my %data;

while ( <DATA> ) {
     next unless /=/;
     my ($key, @values) = /[-\w.]+/g;
     $data{$key} = \@values;
}

use Data::Dumper;

print Dumper \%data;


__DATA__
min_entry = -0.236, 0, 0.236 , 0.382, 0.500, 0.618, 0.764
max_entry=0.236, 0.382, 0.500, 0.618, 0.764, 1.000
#jakis komentarz
rsi_confirm= 25,27,30, 32
slope3 = 0.236, 0.382, 0.5, 0.764
min_tp=0.0125 , 0.0236, 0.0382, 0.05, 0.0764, 0.1
interval = 14

[thresholds]
low = 40
high = 40
persistence = 9

output

$VAR1 = {
          'high' => [
                      '40'
                    ],
          'interval' => [
                          '14'
                        ],
          'slope3' => [
                        '0.236',
                        '0.382',
                        '0.5',
                        '0.764'
                      ],
          'persistence' => [
                             '9'
                           ],
          'low' => [
                     '40'
                   ],
          'min_tp' => [
                        '0.0125',
                        '0.0236',
                        '0.0382',
                        '0.05',
                        '0.0764',
                        '0.1'
                      ],
          'min_entry' => [
                           '-0.236',
                           '0',
                           '0.236',
                           '0.382',
                           '0.500',
                           '0.618',
                           '0.764'
                         ],
          'max_entry' => [
                           '0.236',
                           '0.382',
                           '0.500',
                           '0.618',
                           '0.764',
                           '1.000'
                         ],
          'rsi_confirm' => [
                             '25',
                             '27',
                             '30',
                             '32'
                           ]
        };

This is the best I can do for you without understanding the full problem

Perl: Request for improvement my REGEX (match only with positive/negative integers/decimals and commas)

3 Answers3

output

output