Unix - filename and string result on same line

Question

I need to search a directory that has hundreds or thousands of files, each containing XML with one or more instances of a specific string (begin/end tag with data). I can get all the instances of the string by doing

grep -ho '<mytagname>..............<\/mytagname>' /home/xyzzy/mydata/*.XML > /home/mydata/tagvalues.txt

then a few sed commands to strip off the tags, so I wind up with a file just containing a list of values:

  value001
  value002
  value003

(etc)

Ideally though, I'd like to have each line of the file to also include the filename so I can import into a database for analysis.

So my result would be something like this

fileAAA value001
fileAAA value002
fileAAA value003
fileBBB value004

Exact formatting of the above is flexible - could have spaces or other separator, it could even still include the begin/end tags.

The closest I've been able to get is with grep -o

fileAAA:value001
value002
value003
fileBBB:value004

A perl one-liner would seem ideal but I'm new enough to that, that I have no clue how to begin.

Miller · Answer 1 · 2014-03-31T22:11:59.397

Could be done using a one-liner like so:

perl -lne 'print "$ARGV $1" if /<mytagname>(.*?)<\/mytagname>/' *.xml

However, I'd strongly recommend that you use an actual XML parser like XML::Twig or XML::LibXML

use strict;
use warnings;

use XML::LibXML;

for my $file (</home/xyzzy/mydata/*.XML>) {
    my $doc = XML::LibXML->load_xml(location => $file);
    for my $node ($doc->findnodes("//mytagname")) {
        print "$file " . $node->textContent() . "\n";
    }
}

anttix · Answer 2 · 2014-03-31T21:51:14.470

0

What about awk?

awk -F'</?mytagname>' '$2 {print FILENAME,$2}' /home/xyzzy/mydata/*.XML

Explanation:

-F regex - set field delimiter must be a separate argument thus enclosed in its own quotes
$2 - if second field has a value
{print FILENAME,$2} - print filename SPACE the value of second field

edited Mar 31 '14 at 21:51

answered Mar 31 '14 at 21:40

anttix

7,709
1
24
25

Thanks - both of those (the perl and the awk) work to some extent: they appear to only take the first occurrence of the string in the file. When I use my original grep, I'm getting many thousands of hits (even after I sort and take unique values). When I use either of the commands as is, I get about 7500 hits, which is the number of files in the directory. – JOATMON Apr 01 '14 at 21:03
Aha - did a little digging and found the answer in another posting [here](http://stackoverflow.com/questions/19031552/perl-one-liner-to-match-all-occurrences-of-regex) - so I changed the perl command to while( /(.*?)<\/mytagname>/g)' instead of the if - and that's giving me a more believable number. – JOATMON Apr 01 '14 at 21:26

Unix - filename and string result on same line

2 Answers2

Linked