-2

In Linux, execute the following command to download the "First Monday" article:

wget -O first_monday.html http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/3156/2747

Use sed and regular expressions to edit first_monday.html as follows:

Remove empty/blank paragraphs, if any. (HTML paragraph starting tag is <p> and ending tag is </p>)

<p>This is some text in a paragraph.</p>

A paragraph is empty if there is nothing or has only spaces or tabs in between <p> and </p>

Remove all images (In HTML, images are defined with the <img> tag. Example:

<img src="html5.gif" alt="The official HTML5 Icon">   

The resulting file should still be a valid HTML file, displayable in a standard web browser. For your answer, copy/paste the commands you used to answer this question. For example, if you used a command similar to

sed -iback -e 's|<p>[[:space:]]*</p>||g' first_monday.html

then you would paste that command as well as any others you used in the answer for this field.

choroba
  • 231,213
  • 25
  • 204
  • 289

1 Answers1

0

Firstly, you can remove empty paragraph tag using following command

sed -i 's|<p>[[:space:]]*</p>||g' first_monday.html

Next, your image tags can also be removed same way using command as follows;

sed -i 's|<img /*>||g' first_monday.html
Arnab Nandy
  • 6,472
  • 5
  • 44
  • 50