16

This is the command I'm using on a standard web page I wget from a web site.

tr '<' '\n<' < index.html

however it giving me newlines, but not adding the left broket in again. e.g.

 echo "<hello><world>" | tr '<' '\n<'

returns

 (blank line which is fine)
 hello>
 world>

instead of

 (blank line or not)
 <hello>
 <world>

What's wrong?

tripleee
  • 175,061
  • 34
  • 275
  • 318
Kamran224
  • 1,584
  • 7
  • 20
  • 33

4 Answers4

33

That's because tr only does character-for-character substitution (or deletion).

Try sed instead.

echo '<hello><world>' | sed -e 's/</\n&/g'

Or awk.

echo '<hello><world>' | awk '{gsub(/</,"\n<",$0)}1'

Or perl.

echo '<hello><world>' | perl -pe 's/</\n</g'

Or ruby.

echo '<hello><world>' | ruby -pe '$_.gsub!(/</,"\n<")'

Or python.

echo '<hello><world>' \
| python -c 'for l in __import__("fileinput").input():print l.replace("<","\n<")'
ephemient
  • 198,619
  • 38
  • 280
  • 391
3

The order of where you put your newline is important. Also you can escape the "<".

`tr '<' '<\n' < index.html` 

works as well.

blizz
  • 4,102
  • 6
  • 36
  • 60
felix747
  • 31
  • 1
3

If you have GNU grep, this may work for you:

grep -Po '<.*?>[^<]*' index.html

which should pass through all of the HTML, but each tag should start at the beginning of the line with possible non-tag text following on the same line.

If you want nothing but tags:

grep -Po '<.*?>' index.html

You should know, however, that it's not a good idea to parse HTML with regexes.

Community
  • 1
  • 1
Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
2

Does this work for you?

awk -F"><" -v OFS=">\n<" '{print $1,$2}'

[jaypal:~/Temp] echo "<hello><world>" | awk -F"><" -v OFS=">\n<" '{$1=$1}1';
<hello>
<world>

You can put a regex / / (lines you want this to happen for) in front of the awk {} action.

jaypal singh
  • 74,723
  • 23
  • 102
  • 147