Perl uninitialized value when using alternation in regex

Question

I have a for loop with an if statement that looks like this:

   for (my $i=0; $i < $size; $i++) {
       if ($array[$i] =~ m/_(B|P|BC|PM)/) {
           #Remove from @array
           splice(@array, $i, 1);
           next;
       }
       #Get rid of numbers at the end
       $array[$i] =~ s/_[0-9]+//;
   }

I am getting an error saying that says "Use of uninitialized value within @array in pattern match...." on the line with the if statement.

When I remove the alternation from the regex on that line, the error goes away. If I comment out the whole if statement, the regex under the comment "#Get rid of numbers at the end" does not produce any errors.

I've printed out all values of @array and everything looks fine. I've tried no parentheses and brackets instead of the parentheses in the expression with no change. Any ideas what could be causing this?

By "remove alternation" I mean that I change the regex to just "$array[$I] =~ m/_B/" — N Lindsay, Jul 24 '17 at 20:19
just decrementing $size isn't enough; you will still be skipping processing the element after one you remove — ysth, Jul 24 '17 at 20:27

score 2 · Accepted Answer · answered Jul 24 '17 at 20:21

Here is a simple demonstration of the same problem.

1: @array = (1,2);
2: $size = 2;
3: for ($i=0; $i<$size; $i++) {
4:    if ($array[$i] == 1) {
5:        splice @array, $i, 1;
6:    }
7: }

So what happens when you execute this code? At line 5, you remove the first element of the array, so the array becomes (2). At the end of the first for-loop iteration, you increment $i (from 0 to 1), compare it to $size (which is still 2), and decide to continue the loop.

Then you are at line 4 again. You are performing an operation on $array[1]. But @array only has one element, $array[1] is not defined, and Perl gives you a warning.

It is important to be careful if you are a modifying a data structure at the same time that you are iterating through it.

--

Consider this alternate, Perlish approach to the first part of your problem:

@array = grep { !m/_(B|P|BC|PM)/ } @array

That is, identify all the elements of @array that satisfy some condition (here, the condition is not matching the pattern), and then update @array so that it only holds those good elements. zdim had another good approach.

zdim · Answer 2 · 2017-07-25T19:43:36.650

2

Removing elements from an array is in principle expensive, even though splice optimizations help. Thanks to ysth for comments. More to the point, working correctly through those indices requires a lot of care, as exposed and dissected in mob's answer. Here is another way

my @new_array = 
    map { 
        s/_[0-9]+//;        #/ cleanup from the last statement in loop
        $_                  # return this element, not return of s/../../
    }
    grep { defined && !/_(B|P|BC|PM)/ }  # remove elements
    @array;

First grep makes sure to skip undef elements, then filters what you need. Its output list is passed as input to map, which makes the change from the last line of your loop to each element.

If you don't care about the old array just assign to @array instead of making a @new_array.

Starting from 5.14.0 we can use the non-destructive /r modifier in the substituion, which returns the changed string and leaves the original unchanged. This is a perfect use case for it

@array = map { s/_[0-9]+//r } grep { defined && !/_(B|P|BC|PM)/ } @array;

where the original array is overwritten.

This processes the data twice. A more efficient version is to loop over the array and push (copy) the elements that are to be kept, suitably changed, into the new array.

edited Jul 25 '17 at 19:43

answered Jul 24 '17 at 20:21

zdim

64,580
5
52
81

removing elements from an array isn't all that expensive. and its expense is relative to the number of elements remaining, not to the size of those elements. – ysth Jul 24 '17 at 20:29
also, since 5.14 you can do `... = map s/_[0-9]+//r, grep ...` – ysth Jul 24 '17 at 20:32
@ysth Well, sure. I am thinking of an arbitrary array, where one may end up removing many, many elements from the middle. I don't know of what optimizations may be done (specially by `splice`), but I am dreading data copy (full array allocations) possibly done as many times as the order of the array size. Which decreases as we go, and this is only the worst case, but I still dont' like the complexity. – zdim Jul 24 '17 at 20:33
@ysth Ah, of course! I just used it in another answer ... thank you, adding :) – zdim Jul 24 '17 at 20:34
push/pop/shift/unshift/splice are all well optimized to do as little as possible in moving the pointers to the elements around, and they don't touch the actual data that remains at all – ysth Jul 24 '17 at 20:53
@ysth OK, so `splice` does that as well, with middle elements? (That's some magic :). Then it's not that expensive as you say. I know that there's quite a bit of extra room at the back, and (I think) about quarter/half as much at the front of the array, for `unshift` and `push`, and of course removal at ends needn't touch the data. I didn't know how far `splice` optimizations go. Adjusted text – zdim Jul 24 '17 at 21:01
@ysth In my mind this may imply that an array is something like a sorted list of pointers (as a C-style list data-structure), to existing scalars. (I think I know, from ikegami's old comment, that creating an array involves construction of scalars for each value.) So then remaking an array means merely copying those pointers. Is something like this the case? If so, it upends most of what I thought of how perl works. I hope I am not bothering you with this. – zdim Jul 24 '17 at 22:09
Yes, it is just an array of pointers with a length and a starting offset – ysth Jul 24 '17 at 22:29
@ysth Thank you very much. This also directly affects my comments (to you) on that other post, which I'll need to reconsider now. – zdim Jul 24 '17 at 22:58

Perl uninitialized value when using alternation in regex

2 Answers2