1

I am new to shell scripting. I am extracting some URL's from mail via python but the URL's decoded by the script are broken. So what I thought was to write a code so that I could only extract the required URL's.

Here is the file:

http://stackoverflow.com/questions/17988756/=
how-to-select-lines-between-two-marker-patterns-which-may-occur-multiple-times-w
.
.
.(some text)
http://stackoverflow.com/questions/9605232/=
merge-two-lines-into-one
.
.
.

The output required is:

http://stackoverflow.com/questions/17988756/how-to-select-lines-between-two-marker-patterns-which-may-occur-multiple-times-w
http://stackoverflow.com/questions/9605232/merge-two-lines-into-one

Thanks in advance.

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
Ritzz
  • 11
  • 1
  • I tried to write some code: – Ritzz Apr 28 '16 at 12:14
  • while IFS= read -r LINE do if [[ $LINE =~ ^http://stackoverflow.com.*= ]] then echo $LINE >> broken_URL.txt echo $[LINE+1] >> broken_URL.txt fi done < file – Ritzz Apr 28 '16 at 12:16
  • and will later use sed 'N;s/=\n//' broken_url.txt > broken_new_url.txt – Ritzz Apr 28 '16 at 12:17
  • 2
    Please put your code in your original post so it retains the formatting. – skylerl Apr 28 '16 at 12:17
  • As a side note, `http://stackoverflow.com/questions/17988756` alone is a valid URL. Whatever you write in `XXX` from `stackoverflow.com/questions//XXX` is irrelevant, all goes to ``stackoverflow.com/questions/`. – fedorqui Apr 28 '16 at 12:49

1 Answers1

2

Use this sed:

sed ':loop; /^http:.*=$/{N;s/=\n//g; t loop}' file

Test :

$ cat file
(some text)
http://stackoverflow.com/questions/9605232/=
merge-two-lines=
-into-one
(some text)

$ sed ':loop; /^http:.*=$/{N;s/=\n//; t loop}' file
(some text)
http://stackoverflow.com/questions/9605232/merge-two-lines-into-one
(some text)
sat
  • 14,589
  • 7
  • 46
  • 65
  • @Ritzz,Glad it worked for you. If this answer helped, consider accepting it by clicking tick mark on left side of an answer. – sat Jun 01 '16 at 08:17