-1

I have one file looks like:

~ cat dump.txt
  <ItemSpec id="46301" Day="1" Week="244251"/>
  <ItemSpec id="46302" Day="2" Week="244252"/>
  <ItemSpec id="46303" Day="3" Week="244253"/>
  <ItemSpec id="46304" Day="4" Week="244254"/>
  <ItemSpec id="46305" Day="5" Week="244255|244256|244257|244255|244256|244257|244255|244256|244257|244255|244256|244257"/>
  ...

I want the Week column number add 100,000 after process, just like:

~ <simple shell code> dump.txt
  <ItemSpec id="46301" Day="1" Week="344251"/>
  <ItemSpec id="46302" Day="2" Week="344252"/>
  <ItemSpec id="46303" Day="3" Week="344253"/>
  <ItemSpec id="46304" Day="4" Week="344254"/>
  <ItemSpec id="46305" Day="5" Week="344255|344256|344257|344255|344256|344257|344255|344256|344257|344255|344256|344257"/>
  ...

I don't know if there is a simple way to use backreference as a number for math operation. And my helpless try as below:

~ awk '{print gensub(/([0-9]{6})/,"\\1+100000","g",$0)}' dump.txt
  <ItemSpec id="46301" Day="1" Week="244251+100000"/>
  <ItemSpec id="46302" Day="2" Week="244252+100000"/>
  <ItemSpec id="46303" Day="3" Week="244253+100000"/>
  <ItemSpec id="46304" Day="4" Week="244254+100000"/>
  <ItemSpec id="46305" Day="5" Week="244255+100000|244256+100000|244257+100000|244255+100000|244256+100000|244257+100000|244255+100000|244256+100000|244257+100000|244255+100000|244256+100000|244257+100000"/>


  ...

Any idea would be helpful, thank you!

hedleyyan
  • 437
  • 3
  • 15

3 Answers3

2

This looks like XML. Parsing XML as plain text is a bad idea - regular expressions are for regular languages, and XML isn't.

So parse as XML instead:

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Twig;

sub increment_week {
   my ( $twig, $itemspec ) = @_; 
   my @values = split /\|/, $itemspec -> att ('Week');
   $_ .= "+10000" for @values;
   $itemspec -> set_att('Week', (join '|', @values ));

}

my $twig = XML::Twig -> new ( keep_atts_order => 1,
                              pretty_print => 'indented',
                              twig_handlers => { 'ItemSpec' => \&increment_week } );
   $twig -> parsefile ('your_file.xml'); 
   $twig -> print;

This means you'll handle the whole thing as XML, and won't get tripped over by valid XML differences (XML lets you wrap lines, alter attribute ordering etc. without altering semantics).

Of course, if it isn't valid XML, this won't work - but writing 'almost XML' like that is a really filthy thing to do. (Almost as filthy as regexing it to 'fix' it)

Community
  • 1
  • 1
Sobrique
  • 52,974
  • 7
  • 60
  • 101
  • Actually the `dump.txt` is just some random text I pick, what I really want to know is some brief shell code snippet for regexp math operations. Apologise for the expression is not clear. And thanx for your answer, it's really helpful. – hedleyyan Nov 24 '16 at 01:45
  • That's why a representative sample of your data becomes important. – Sobrique Nov 24 '16 at 13:36
0

Found one similar question and problem solved!

Math operations in regex

perl -pe 's/(\d{6})/$1+100000/eg' dump.txt
Community
  • 1
  • 1
hedleyyan
  • 437
  • 3
  • 15
0

May be you can try this:

my $line = $_; my $i = 100000;
$line=~s#\s+Week="([^"]*)"# my $weeks=$&; $weeks=~s/\b(\d+)\b/($1+$i)/ge; ($weeks);#esg;
ssr1012
  • 2,573
  • 1
  • 18
  • 30