Grep for Multiple instances of string between a substring and a character?

Question

Can you please tell me how to Grep for every instance of a substring that occurs multiple times on multiple lines within a file?

I've looked at https://unix.stackexchange.com/questions/131399/extract-value-between-two-search-patterns-on-same-line and How to use sed/grep to extract text between two words?

But my problem is slightly different - each substring will be immediately preceded by the string: name"> and will be terminated be a < character immediately after the last character of the substring I want.

So one line might be

<"name">Bob<125><adje></name><"name">Dave<123><adfe></name><"name">Fred<125><adfe></name>

And I would like the output to be:

Bob
Dave
Fred

Showing just one sample line is extremely unlikely to help us figure out a robust solution for you. Your text says the issue is related to multiple lines so show multiple lines. Also use the `{}` editor button to format your input/output/code files. — Ed Morton, Dec 05 '15 at 18:16
so you're really trying to parse XML with a reg-exp? See http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 for why not ;-) Good luck. — shellter, Dec 05 '15 at 20:18
Thanks for the relies so far, sorry for my poor question formatting! — Roger, Dec 06 '15 at 19:33
Thanks for the relies so far, sorry for my poor question formatting! I realised that what I really would have preferred is if the multiple sets of data were no on the same line, so I did this (got the idea reading SO): `code` grep name\"\> | awk '{ gsub("\"name\">", "\n\"name\">") } 1' `code` to insert a new line in front of every "name" field, (and others) I then used a combination of Grep and Cut to hack out just the data, it's slow and inelegant, but it does work. I will of course look at the other answers and compare them, thank you. — Roger, Dec 06 '15 at 19:40

score 0 · Answer 1 · answered Dec 06 '15 at 03:56

Although awk is not the best tool for xml processing, it will help if your xml structure and data simple enough.

$ awk -F"[<>]" '{for(i=1;i<NF;i++) if($i=="\"name\"") print $(++i)}' file
Bob
Dave
Fred

I doubt that the tag is <"name"> though. If it's <name>, without the quotes change the condition in the script to $i=="name"

score 0 · Answer 2 · answered Dec 06 '15 at 11:44

0

gawk

awk -vRS='<"name">|<' '/^[A-Z]/' file
Bob
Dave
Fred

answered Dec 06 '15 at 11:44

bian

1,456
8
7

Grep for Multiple instances of string between a substring and a character?

2 Answers2