3

I need to search for a key and append the value to every key:value pair in a Unix file

Input file data:

1A:trans_ref_id|10:account_no|20:cust_name|30:trans_amt|40:addr
1A:trans_ref_id|10A:ccard_no|20:cust_name|30:trans_amt|40:addr

My desired Output:

account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|10A:ccard_no
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|40:addr

Basically, I need the value of 10 or 10A appended to every key:value pair and split into new lines. To be clear, this won't always be the second field.

I am new to sed, awk and perl. I started with extracting the value using awk:

awk -v FS="|" -v key="59" '$2 == key {print $2}' target.txt
ikegami
  • 367,544
  • 15
  • 269
  • 518
nag
  • 75
  • 5
  • 3
    `awk` with a field-separator of `'|'` and then `split()` on `':'` would go a long way to getting you started, [GNU Awk User's Guide](https://www.gnu.org/software/gawk/manual/html_node/index.html#SEC_Contents) You will find `split()` under [9.1.3 String-Manipulation Functions](https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html#String-Functions) – David C. Rankin Jun 09 '21 at 06:28
  • well, I am pretty new to string functions. But was trying with various perl and AWK options. – nag Jun 09 '21 at 06:30
  • 3
    @nag, sure, please do add your attempted code in your question, which is highly encouraged on SO. There is nothing wrong or right about your efforts, we all are here to learn from each other. – RavinderSingh13 Jun 09 '21 at 06:31
  • 1
    @RavinderSingh13, thank you, I started with extracting the value using awk: awk -v FS="|" -v key="59" '$2 == key {print $2}' target.txt but my key can be present in any location – nag Jun 09 '21 at 06:50

6 Answers6

3
# Looks for 10 or 10A
perl -F'\|' -lane'my ($id) = map /^10A?:(.*)/s, @F; print "$id|$_" for @F'
# Looks for 10 or 10<non-digit><maybe more>
perl -F'\|' -lane'my ($id) = map /^10(?:\D[^:]*)?:(.*)/s, @F; print "$id|$_" for @F'
  • -n executes the program for each line of input.
  • -l removes LF on read and adds it on print.
  • -a splits the line on | (specified by -F) into @F.
  • The first statement extracts what follows : in the field with id 10 or 10-plus-something.
  • The second statement prints a line for each field.

Specifying file to process to Perl one-liner

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • 1
    @Sundeep, Thanks, was just about to test. Apparently, `-F` takes a regex pattern, so it needs to be escaped for both the shell and the regex engine. The issue with the new version has also been fixed. – ikegami Jun 09 '21 at 07:21
  • 1
    @nag, Re "*the position of the key:value pair for 10 or 10A can vary in the record. Not always at $2.*", I have adjusted my answer accordingly. – ikegami Jun 09 '21 at 07:23
  • @Sundeep, It didn't work according to the additional info provided by the OP. – ikegami Jun 09 '21 at 07:24
  • @Sundeep, The OP wasn't didn't specify whether `10B`, etc should be accepted or not. I assumed the answer is either yes or `10B` will never happen. – ikegami Jun 09 '21 at 07:26
  • @Sundeep Re "*or just `/10A?:([^|]+)/` instead of map*", No, that could match `10A:` inside of a value, or a key of `110A`, etc – ikegami Jun 09 '21 at 07:28
  • @Sundeep, Still would fail by matching `110A` – ikegami Jun 09 '21 at 07:30
  • Yeah realized that.. changed it to `/(?<![^|])10A?:([^|]+)/` – Sundeep Jun 09 '21 at 07:32
  • 1
    @Sundeep, Yeah, you could use that. Or `/(?:^|\|)10A?:([^|]*)/`. But at this point, I believe `map` becomes clearer :) – ikegami Jun 09 '21 at 07:34
3

If you are still stuck on where to get started, you will use a field-separator and output-field-separator (FS and OFS) set equal to '|' that will split each record into fields at each '|'. Your fields are available as $1, $2, ... $NF. You care about getting, e.g. account_no from field two ($2) so you split() field two with the separator ':' saving the split fields in an array (a used below). You want the second part from field two which will be in the 2nd array element a[2] to use as the new field-1 in output.

The rest is just looping over each field and outputting a[2] a separator and then the current field. You can do that with:

awk  'BEGIN{FS=OFS="|"} {split ($2,a,":"); for(i=1;i<=NF;i++) print a[2],$i}' file

Example Use/Output

With your example input in file, the result would be:

account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|10A:ccard_no
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|40:addr

Which appears to be what you are after. Let me know if you have further questions.

"10" or "10A" at Unknown Field

You can handle the fields containing "10" and "10A" in any order. You just add a loop to loop over the fields and determine which holds "10" or "10A" and save the 2nd element from the array resulting from split() from that field. The rest is the same, e.g.

awk  '
    BEGIN { FS=OFS="|" } 
    {   for (i=1;i<=NF;i++){ 
            split ($i,a,":")
            if (a[1]=="10"||a[1]=="10A"){ 
                key=a[2]
                break
            }
        }
        for (i=1;i<=NF;i++)
            print key, $i
    }
' file1

Example Input

1A:trans_ref_id|10:account_no|20:cust_name|30:trans_amt|40:addr
1A:trans_ref_id|20:cust_name|30:trans_amt|10A:ccard_no|40:addr

Example Use/Output

awk  '
>     BEGIN { FS=OFS="|" }
>     {   for (i=1;i<=NF;i++){
>             split ($i,a,":")
>             if (a[1]=="10"||a[1]=="10A"){
>                 key=a[2]
>                 break
>             }
>         }
>         for (i=1;i<=NF;i++)
>             print key, $i
>     }
> ' file1
account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|10A:ccard_no
ccard_no|40:addr

Which picks up the proper new field 1 for output from the 4th field containing "10A" for the second line above.

Let em know if this is what you needed.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • thank you for the solution. However, the position of the key:value pair for 10 or 10A can vary in the record. Not always at $2. – nag Jun 09 '21 at 07:02
  • Very well explained answer – anubhava Jun 09 '21 at 07:03
  • 1
    @nag the position for `10` or `10A` in the answer can very. They will simply appear in the output in the order of the fields where it appears in the input -- did you need something else? (in other words, I didn't do anything special to put them in that position). You can provide another variable (as your `key="match"` and locate the fields that contains the key and use that as your new field one -- completely up to you. That takes a tiny addition or two. – David C. Rankin Jun 09 '21 at 07:04
  • My input file data can sometimes be as : 1A:trans_ref_id|10:account_no|20:cust_name|30:trans_amt|40:addr 1A:trans_ref_id|20:cust_name|30:trans_amt|10A:ccard_no|40:addr – nag Jun 09 '21 at 07:06
  • In that case are you saying you want `cust_name|10A:ccard_no` before `cust_name|30:trans_amt` in the output? – David C. Rankin Jun 09 '21 at 07:10
  • @DavidC.Rankin, the output should always be the same as ccard_no|20:cust_name. – nag Jun 09 '21 at 07:20
  • 2
    @nag -- thank you for making that clear. I have added another answer based on that requirement at the bottom of my original answer -- which should be what you needed. – David C. Rankin Jun 09 '21 at 07:38
3

I need the value of 10 or 10A appended to every key:value pair

Going by these requirements, you may try this awk:

awk '
BEGIN{FS=OFS="|"}
match($0, /\|10A?:[^|]+/) {
   s = substr($0, RSTART, RLENGTH)
   sub(/.*:/, "", s)
}
{
   for (i=1; i<=NF; ++i)
      print s, $i
}' file

account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|10A:ccard_no
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|40:addr
anubhava
  • 761,203
  • 64
  • 569
  • 643
2

EDIT: To find 10 OR 10A values in anywhere in line and then print as per that try following then.

awk '
BEGIN{
  FS=OFS="|"
}
match($0,/(10|10A):[^|]*/){
  split(substr($0,RSTART,RLENGTH),arr,":")
}
{
  for(i=1;i<=NF;i++){
    print arr[2],$i
  }
}'  Input_file

Explanation: Adding detailed explanation for above.

awk '                        ##Starting awk program from here.
BEGIN{                       ##Starting BEGIN section of this program.
  FS=OFS="|"                 ##Setting FS and OFS to | here.
}
match($0,/(10|10A):[^|]*/){  ##using match function to match either 10: till | OR 10A: till | here.
  split(substr($0,RSTART,RLENGTH),arr,":") ##Splitting matched sub string into array arr with delmiter of : here.
}
{
  for(i=1;i<=NF;i++){        ##Running for loop for each field for each line.
    print arr[2],$i          ##Printing 2nd element of ar, along with current field.
  }
}'  Input_file               ##Mentioning Input_file name here.


With your shown samples, please try following.

awk '
BEGIN{
  FS=OFS="|"
}
{
  split($2,arr,":")
  print arr[2],$1
  for(i=2;i<=NF;i++){
    print arr[2],$i
  }
}
' Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
1

Perl script implementation

use strict;
use warnings;
use feature 'say';

my $fname = shift || die "run as 'script.pl input_file key0 key1 ... key#'";

open my $fh, '<', $fname || die $!;

while( <$fh> ) {
    chomp;
    my %data = split(/[:\|]/, $_);
    for my $key (@ARGV) {
        if( $data{$key} ) {
            say "$data{$key}|$_" for split(/\|/,$_);
        }
    }
}

close $fh;

Run as script.pl input_file 10 10A

Output

account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|10A:ccard_no
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|40:addr
Polar Bear
  • 6,762
  • 1
  • 5
  • 12
1

Here's an alternate perl solution:

perl -pe '($id) = /(?<![^|])10A?:([^|]+)/; s/([^|]+)[|\n]/$id|$1\n/g'
  • ($id) = /(?<![^|])10A?:([^|]+)/ this will capture the string after 10: or 10A: and save in $id variable. First such match in the line will be captured.
  • s/([^|]+)[|\n]/$id|$1\n/g every field is then prefixed with value in $id and | character
Sundeep
  • 23,246
  • 2
  • 28
  • 103