How to use backreferrence matches for simple math operations?

Question

I have one file looks like:

~ cat dump.txt
  <ItemSpec id="46301" Day="1" Week="244251"/>
  <ItemSpec id="46302" Day="2" Week="244252"/>
  <ItemSpec id="46303" Day="3" Week="244253"/>
  <ItemSpec id="46304" Day="4" Week="244254"/>
  <ItemSpec id="46305" Day="5" Week="244255|244256|244257|244255|244256|244257|244255|244256|244257|244255|244256|244257"/>
  ...

I want the Week column number add 100,000 after process, just like:

~ <simple shell code> dump.txt
  <ItemSpec id="46301" Day="1" Week="344251"/>
  <ItemSpec id="46302" Day="2" Week="344252"/>
  <ItemSpec id="46303" Day="3" Week="344253"/>
  <ItemSpec id="46304" Day="4" Week="344254"/>
  <ItemSpec id="46305" Day="5" Week="344255|344256|344257|344255|344256|344257|344255|344256|344257|344255|344256|344257"/>
  ...

I don't know if there is a simple way to use backreference as a number for math operation. And my helpless try as below:

~ awk '{print gensub(/([0-9]{6})/,"\\1+100000","g",$0)}' dump.txt
  <ItemSpec id="46301" Day="1" Week="244251+100000"/>
  <ItemSpec id="46302" Day="2" Week="244252+100000"/>
  <ItemSpec id="46303" Day="3" Week="244253+100000"/>
  <ItemSpec id="46304" Day="4" Week="244254+100000"/>
  <ItemSpec id="46305" Day="5" Week="244255+100000|244256+100000|244257+100000|244255+100000|244256+100000|244257+100000|244255+100000|244256+100000|244257+100000|244255+100000|244256+100000|244257+100000"/>


  ...

Any idea would be helpful, thank you!

score 2 · Answer 1 · edited May 23 '17 at 12:24

This looks like XML. Parsing XML as plain text is a bad idea - regular expressions are for regular languages, and XML isn't.

So parse as XML instead:

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Twig;

sub increment_week {
   my ( $twig, $itemspec ) = @_; 
   my @values = split /\|/, $itemspec -> att ('Week');
   $_ .= "+10000" for @values;
   $itemspec -> set_att('Week', (join '|', @values ));

}

my $twig = XML::Twig -> new ( keep_atts_order => 1,
                              pretty_print => 'indented',
                              twig_handlers => { 'ItemSpec' => \&increment_week } );
   $twig -> parsefile ('your_file.xml'); 
   $twig -> print;

This means you'll handle the whole thing as XML, and won't get tripped over by valid XML differences (XML lets you wrap lines, alter attribute ordering etc. without altering semantics).

Of course, if it isn't valid XML, this won't work - but writing 'almost XML' like that is a really filthy thing to do. (Almost as filthy as regexing it to 'fix' it)

Actually the `dump.txt` is just some random text I pick, what I really want to know is some brief shell code snippet for regexp math operations. Apologise for the expression is not clear. And thanx for your answer, it's really helpful. — hedleyyan, Nov 24 '16 at 01:45
That's why a representative sample of your data becomes important. — Sobrique, Nov 24 '16 at 13:36

score 0 · Accepted Answer · edited May 23 '17 at 12:30

0

Found one similar question and problem solved!

Math operations in regex

perl -pe 's/(\d{6})/$1+100000/eg' dump.txt

edited May 23 '17 at 12:30

Community

1
1

answered Nov 23 '16 at 09:15

hedleyyan

437
3
15

ssr1012 · Answer 3 · 2016-11-23T09:24:28.737

0

May be you can try this:

my $line = $_; my $i = 100000;
$line=~s#\s+Week="([^"]*)"# my $weeks=$&; $weeks=~s/\b(\d+)\b/($1+$i)/ge; ($weeks);#esg;

edited Nov 23 '16 at 09:24

answered Nov 23 '16 at 09:17

ssr1012

2,573
1
18
30

How to use backreferrence matches for simple math operations?

3 Answers3