-2

I am currently writing a sed script in which i have to print the 30titles of a website following a certain kind of print. I have the following error " sed : file news.sed line 1: unknown options to 's'. Here is my code :

curl -sL news.ycombinator.com |
sed -nE '/\n/!s/class="title"><a[^>]*>[^<]*</\n&\n/g;/^class="title"/P;D' |
sed -E 's/class="title"><a href="([^"]*)" class="titlelink">([^<]*)</**\2**\n\1/'

Do you know how i can fix it? Btw i can only use sed to solve this issue and not html parser.

heyman
  • 11
  • 2
  • Btw news.sed is my file name and i have to print titles from the website https://news.ycombinator.com/. I also have to test with the follwing command : curl -s https://news.ycombinator.com/ | sed -n -f news.sed – heyman Nov 15 '21 at 13:42
  • You have slashes in the text you are parsing. You should use a [different delimiter](https://stackoverflow.com/questions/33914360/what-delimiters-can-you-use-in-sed), like `|` – Aserre Nov 15 '21 at 14:40
  • do you mean i should use !s | instead of !s / and same for line below ? – heyman Nov 15 '21 at 14:50
  • Yup. `|` is an example, though, if you get the same error, try another delimiter not used in your input. Also, use the same separator everywhere in your command (`!s|...|...|P;D` for instance) – Aserre Nov 15 '21 at 14:56
  • i've tryed to replace but it is not working. I still get the same error message – heyman Nov 15 '21 at 14:57
  • See also https://stackoverflow.com/q/69946033/7552 – glenn jackman Nov 15 '21 at 14:59
  • i already saw that and i have exactly the same error and the same code but i didnt see any good answer here – heyman Nov 15 '21 at 15:01
  • Also, sed splits lines on newline, so `/\n/!` applies for every line (newline will not appear in the pattern space, unless you're doing stuff with the hold space) – glenn jackman Nov 15 '21 at 15:03
  • alright but is there a link between this and my error with sed ? – heyman Nov 15 '21 at 15:05
  • I'm on this boring exercise since the beginning of the month any solution would be much appreciated. – heyman Nov 15 '21 at 22:31

2 Answers2

0

This might work for you (GNU sed):

cat <<\! > news.sed
/\n/!s/class="title"><a[^>]*>[^<]*</\n&\n/g
/^class="title"/{
h
x
s/^class="title"><a href="([^"]*)" class="titlelink"[^>]*>([^<]*)<.*/**\2**\n\1/p
x
}
D
!
curl -sL news.ycombinator.com | sed -Enf news.sed

This combines the 2 sed invocations into one sed script and applies it using the -f option.

N.B. This is GNU sed specific. It also uses a little known idiom that treats each line with a global substitution which inserts newlines into the pattern space. The D command is invoked and deletes up to and including the first newline, but does not complete the current sed cycle until the pattern space is empty (this basically chomps the pattern space by each newline inserted and if the start of line matches another regexp applies the bracketed expressions). The bracketed expression. makes a copy of the pattern space in the hold space, swaps to the hold space, formats the start of the hold space to deliver 2 formated lines, reverts back to the pattern space and chomps up to the next newline and then repeats.

This is a very rough and ready solution and may not cater for all HTML which can be returned via the curl command.

potong
  • 55,640
  • 6
  • 51
  • 83
  • Thanks for your help. However it is aint compiling and i have the following error message : "sed: file news.sed line 6: invalid reference \1 on `s' command's RHS". I dont have any idea how I can fix it but maybe you do. Thank you again for your time – heyman Nov 16 '21 at 10:24
  • @heyman are you using GNU sed? To verify type `sed --version` at the terminal. Are you setting the `-E` option? – potong Nov 16 '21 at 11:38
  • im using GNU sed 4.8. What do you mean when you say set the -E option ? I've just tryed your code exactly as you posted it. I added or deleted nothing in it. – heyman Nov 16 '21 at 14:02
0

I have the following error " sed : file news.sed line 1: unknown options to 's'.

You have a carriage return character at the end of the third script line (at least), which, as it immediately follows the s/…/…/ command, is interpreted as an option to it. You can eliminate the CRs in the script file e. g. with sed -i 's/\r//' news.sed.

Armali
  • 18,255
  • 14
  • 57
  • 171