-3

I have text data at the command line that is broken into "records", each with the same value (always 1). In each record, each line is a separate key and value (no this isn't in json unfortunately). A key is sometimes repeated in the record, and sometimes the key name is part of a longer key. For example:

Record = 1
  Apple = 1
  Ball = 2
  Car = 3
    RedApple = 4
    Ball = 5
  Dog = 6
  Elf = 7
  Fudge = 8
Record = 1
  Apple = 2
  Ball = 4
  Car = 6
    RedApple = 8
    Ball = 10
  Dog = 12
  Elf = 14
  Fudge = 16
Record = 1
  Apple = 3
  Ball = 6
  Car = 9
    RedApple = 12
    Ball = 15
  Dog = 18
  Elf = 21
  Fudge = 24

Is there a quick for each record get the lines for a set of keys, returning only the first result per key?

Ex: For each record get keys {Apple, Ball, Dog}

would match the following lines:

Record = 1
  Apple = 1
  Ball = 2
  Dog = 6
Record = 1
  Apple = 2
  Ball = 4
  Dog = 12
...

Basically, the rule is after matching a line with "Record", get the next unique lines with " Apple ", " Ball ", and " Dog " (spacing indicating exact key match) and spit those lines out.

I can write something in perl and it wouldn't be too complex. I don't know awk, so don't know if it's better for something like this.

angusc
  • 401
  • 3
  • 10
  • 1
    *"I can write something in perl and it wouldn't be too complex"* So what do you need our help for? – Borodin Jun 27 '17 at 19:16
  • Hoping to learn a better way than a multi-line perl script – angusc Jun 27 '17 at 19:21
  • You should post your Perl and describe the problems you have with it. There is nothing wring with a multi-line Perl program. – Borodin Jun 27 '17 at 19:28
  • I get that, but I don't need help with a perl script, I know I can make that work. I was seeing if I could learn a better way, like a single command line. – angusc Jun 27 '17 at 19:44
  • Then you should put that in your question. As it stands it's far from clear what you're asking. And there's nothing "better" about a single command line. It would help you to get better answers if you showed good faith and published the Perl you gave written, although it's sounding like your question belongs on *Code Review*. – Borodin Jun 27 '17 at 19:48

3 Answers3

2

Is there a quick for each record get the lines for a set of keys, returning only the first result per key?

I don't believe that's actually what you want. I believe you actually want the items labeled Apple, Ball and Dog at the second level, meaning both

Record = 1
  Apple = 1
  Ball = 2
  Car = 3
    RedApple = 4
    Ball = 5
  Dog = 6
  Elf = 7
  Fudge = 8

and

Record = 1
  Apple = 1
  Car = 3
    RedApple = 4
    Ball = 5
  Ball = 2
  Dog = 6
  Elf = 7
  Fudge = 8

should produce

Record = 1
  Apple = 1
  Ball = 2
  Dog = 6

If so, you could use

perl -ne'print if /^(?:\S|[ ]{2}(?:Apple|Ball|Dog)[ ]=)/'

or

grep -P '^(?:\S|[ ]{2}(?:Apple|Ball|Dog)[ ]=)'

Output:

Record = 1
  Apple = 1
  Ball = 2
  Dog = 6
Record = 1
  Apple = 2
  Ball = 4
  Dog = 12
Record = 1
  Apple = 3
  Ball = 6
  Dog = 18

See Specifying file to process to Perl one-liner for usage.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Tried this regex, returned empty result set. Thanks – angusc Jun 27 '17 at 19:46
  • There was an error with the `grep` version. (I had tested it, found the problem and fixed it, but the fix didn't make it to my answer.) My answer has been fixed. – ikegami Jun 27 '17 at 20:06
0

If this isn't all you need:

$ grep -E '^(Record|  (Apple|Ball|Car))' file
Record = 1
  Apple = 1
  Ball = 2
  Car = 3
Record = 1
  Apple = 2
  Ball = 4
  Car = 6
Record = 1
  Apple = 3
  Ball = 6
  Car = 9

then edit your question to show a more truly representative example. Right now you've accepted an answer that's also based on guessing at your needs and may be more complicated than necessary (while this one may be more simple).

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Thought I was being clear, but obviously not. Unfortunately this doesn't work as there is an unknown amount of leading spaces, so I can't use a single space to exclude second matches for lines. – angusc Jun 28 '17 at 08:56
  • Again, if you'd like help to come up with the best possible solution then edit your question to show a more truly representative example. From what you've told us so far I really think your currently accepted answer is more complicated than necessary. – Ed Morton Jun 28 '17 at 11:46
-1

awk to the rescue!

$ awk '/^Record/ {h=$0; a["Apple"]=a["Dog"]=a["Ball"]=0}
       $1 in a   {if(h) {print h; h=""}
                  if(!a[$1]++) print}' file

Record = 1
  Apple = 1
  Ball = 2
  Dog = 6
Record = 1
  Apple = 2
  Ball = 4
  Dog = 12
Record = 1
  Apple = 3
  Ball = 6
  Dog = 18

Explanation saves header line and reset the counts. For the lines that has the first field in required keys print header once and print the lines for the first appearance of the key.

If you wanted to extract the second level items only, need to incorporate leading spaces as part of key (to determine the hierarchy). This can be one alternative...

$ awk -F' *= *' '/Record/ {h=$0; a["  Apple"]=a["  Dog"]=a["  Ball"]=0} 
                 $1 in a  {if(h) {print h;h=""}; if(!a[$1]++) print}'
karakfa
  • 66,216
  • 7
  • 41
  • 56
  • Awesome! Thanks, never learned much awk, but I can make some sense of that, basically tracking which values have been found already. – angusc Jun 27 '17 at 19:54
  • if that's the case, need to set FS to non space (perhaps "=" sign) and use the required number of spaces as part of the key. – karakfa Jun 27 '17 at 20:03