Extract data between two strings using either AWK or SED

Question

I'm trying to extract data/urls (in this case - someurl) from a file that contains them within some tag ie.

xyz>someurl>xyz

I don't mind using either awk or sed.

Likely same as: http://stackoverflow.com/questions/13386080/extract-text-between-two-strings-repeatedly-awk-sed, although the example there is ugly. — Ciro Santilli OurBigBook.com, Jul 07 '15 at 08:41
Possible duplicate of [How to use sed/grep to extract text between two words?](https://stackoverflow.com/questions/13242469/how-to-use-sed-grep-to-extract-text-between-two-words) — tripleee, Aug 22 '18 at 09:48

fedorqui · Accepted Answer · 2013-05-29T13:07:32.047

9

I think the best, easiest, way is with cut:

$ echo "xyz>someurl>xyz" | cut -d'>' -f2
someurl

With awk can be done like:

$ echo "xyz>someurl>xyz" | awk  'BEGIN { FS = ">" } ; { print $2 }'
someurl

And with sed is a little bit more tricky:

$ echo "xyz>someurl>xyz" | sed 's/\(.*\)>\(.*\)>\(.*\)/\2/g'
someurl

we get blocks of something1<something2<something3 and print the 2nd one.

edited May 29 '13 at 13:07

answered May 29 '13 at 13:02

fedorqui

score 0 · Answer 2 · answered May 29 '13 at 13:28

0

grep was born to extract things:

kent$  echo "xyz>someurl>xyz"|grep -Po '>\K[^>]*(?=>)'
someurl

you could kill a fly with a bomb of course:

kent$  echo "xyz>someurl>xyz"|awk -F\> '$0=$2'
someurl

answered May 29 '13 at 13:28

Kent

Hi Kent, if you don't mind, could you explain '>\K[^>]*(?=>)' I'm assuming [^>] means the start and (?=>) as the end; what are the rest? thanks – L P May 29 '13 at 16:05
`[^>]` means any char instead of `>` and `(?=>)` is look-ahead. matches char followed by `>` – Kent May 29 '13 at 16:07
awk is not a bomb, this is exactly what it is intended for. – Thorbjørn Ravn Andersen Jun 03 '13 at 16:55

score 0 · Answer 3 · answered May 29 '13 at 13:32

0

If your grep supports P option then you can use lookahead and lookbehind regular expression to identify the url.

$ echo "xyz>someurl>xyz" | grep -oP '(?<=xyz>).*(?=>xyz)'
someurl

This is just a sample to get you started not the final answer.

answered May 29 '13 at 13:32

jaypal singh

3 Answers3