The answers to two important questions will affect whether you even need to use a regular expression to match the various number formats, or if you can do something much simpler:
- Are you certain that your lines contain numbers only or do they also contain other data (or possibly some lines have no numbers at all and only other data)?
- Are you certain that all numbers are separated from each other and/or other data by at least one space? If not, how are they separated? (For example, output from
portsnap fetch
generates lots of numbers like this 3690....3700.... with decimal points and no spaces at all used to separate them.
If your lines contain only numbers and no other data, and numbers are separated by spaces, then you do not even need to check if the results are numbers, but only split the line apart:
my @numbers = split /\s+/;
If you are not sure that your lines contain numbers, but you are sure that there is at least one space between each number and other numbers or other data, then the next line of code is a quite good way of extracting numbers properly with a clever way of allowing Perl itself to recognize all the many different legal formats of numbers. (This assumes that you do not want to convert other data values to NaN
.) The result in @numbers
will be proper recognition of all numbers within the current line of input.
my @numbers = grep { 1*$_ eq $_ } m/(\S*\d\S*)/g;
# we could do simply a split, but this is more efficient because when
# non-numeric data is present, it will only perform the number
# validation on data pieces that actually do contain at least one digit
You can determine if at least one number was present by checking the truth value of the expression @numbers > 1
and if exactly four were present by using the condition @numbers == 4
, etc.
If your numbers are bumped up against each other, for instance, 5.17e+7-4.0e-1 then you will have a more difficult time. That is the only time you will need complicated regular expressions.
Note: Updated code to be even faster/better.
Note 2: There is a problem with the most up-voted answer due to a subtlety of how map works when storing the value of undef. This can be illustrated by the output from that program when using it to extract numbers from the first line of data such as an HTTP log file. The output looks correct, but the array actually has many empty elements and one would not find the first number stored in $numbers[0]
as expected. In fact, this is the full output:
$ head -1 http | perl prog1.pl
Use of uninitialized value $numbers[0] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[1] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[2] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[3] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[4] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[5] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[6] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[7] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[10] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[11] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[12] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[13] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[14] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[15] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[16] in join or string at prog1.pl line 8, <> line 1.
200 2206
(Note that the indentation of these numbers shows how many empty array elements are present in @numbers
and have been joined together by spaces before the actual numbers when the array has been converted to a string.)
However, my solution produces the proper results both visually and in the actual array contents, i.e., $numbers[0], $number[1], etc., are actually the first and second numbers contained in the line of the data file.
while (<>) {
my @numbers = m/(\S*\d\S*)/g;
@numbers = grep { $_ eq 1*$_ } @numbers;
print "@numbers\n";
}
$ head -1 http | perl prog2.pl
200 2206
Also, using the slow library function makes the other solution run 50% slower. Output was otherwise identical when running the programs on 10,000 lines of data.