I have a XML file that I wish to extract all occurrences of some tag AB. The file is one long line with ~500 000 chars.
Now I do know about regexp and such, but when I try it with sed
and try to extract only the characters within the tags I am totally lost regarding the result :).
Here's my command:
sed -r 's/(.*)<my_tag>([A-Z][A-Z])<\/my_tag>(.*)/hello\2/g' myfile.out
transforms the entire file with only "helloAB" e.g. While the expected should at least contain 100+ matches.
So I'm thinking around the concepts of greedy matching and such but not getting anywhere. Maybe awk
is a better idea?