0

Can you please tell me how to Grep for every instance of a substring that occurs multiple times on multiple lines within a file?

I've looked at https://unix.stackexchange.com/questions/131399/extract-value-between-two-search-patterns-on-same-line and How to use sed/grep to extract text between two words?

But my problem is slightly different - each substring will be immediately preceded by the string: name"> and will be terminated be a < character immediately after the last character of the substring I want.

So one line might be

<"name">Bob<125><adje></name><"name">Dave<123><adfe></name><"name">Fred<125><adfe></name>

And I would like the output to be:

Bob
Dave
Fred
Community
  • 1
  • 1
  • Showing just one sample line is extremely unlikely to help us figure out a robust solution for you. Your text says the issue is related to multiple lines so show multiple lines. Also use the `{}` editor button to format your input/output/code files. – Ed Morton Dec 05 '15 at 18:16
  • so you're really trying to parse XML with a reg-exp? See http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 for why not ;-) Good luck. – shellter Dec 05 '15 at 20:18
  • Thanks for the relies so far, sorry for my poor question formatting! – Roger Dec 06 '15 at 19:33
  • Thanks for the relies so far, sorry for my poor question formatting! I realised that what I really would have preferred is if the multiple sets of data were no on the same line, so I did this (got the idea reading SO): `code` grep name\"\> | awk '{ gsub("\"name\">", "\n\"name\">") } 1' `code` to insert a new line in front of every "name" field, (and others) I then used a combination of Grep and Cut to hack out just the data, it's slow and inelegant, but it does work. I will of course look at the other answers and compare them, thank you. – Roger Dec 06 '15 at 19:40

2 Answers2

0

Although awk is not the best tool for xml processing, it will help if your xml structure and data simple enough.

$ awk -F"[<>]" '{for(i=1;i<NF;i++) if($i=="\"name\"") print $(++i)}' file
Bob
Dave
Fred

I doubt that the tag is <"name"> though. If it's <name>, without the quotes change the condition in the script to $i=="name"

karakfa
  • 66,216
  • 7
  • 41
  • 56
0

gawk

awk -vRS='<"name">|<' '/^[A-Z]/' file
Bob
Dave
Fred
bian
  • 1,456
  • 8
  • 7