3

I have python variable with a html content like

>>>>a='<html><h1><a href="http://www.google.com">Link to Google<></h1></html>'

How can I print it as a html?

I would like print this variable in my terminal and I would like a result like this:

a.html

OBS.: If python can do this without shell script or other programs I will prefer this.

GarouDan
  • 3,743
  • 9
  • 49
  • 75
  • Take a look at this question: http://stackoverflow.com/questions/287871/print-in-terminal-with-colors-using-python – Wayne Werner Nov 10 '11 at 14:26
  • @WayneWerner it's very interesting. But unfortunally the variables that I want to print is already in html. I don't need the colors truly, it's interesting, but I'm interested on interpret html in python and print it like a webpage. See this example pastebin [link](http://pastebin.com/PJF58pX1): – GarouDan Nov 10 '11 at 14:37
  • I'm still confused at what you want - you told martincho that you did not want plain text, you wanted the formatting of the links to show up. However, in your example image there is only plain text colored blue. Now you tell me that you don't need colors? Are you just trying to display the text of the page, or do you want to be able to interact with it in some way? – Wayne Werner Nov 10 '11 at 19:29
  • This color text is just the way how the w3m interpret html if I open this same file in a brownser they will look like a "normal" webpage. – GarouDan Nov 14 '11 at 02:36
  • So the question is: what exactly do you want to display? Do you want to create a Python-only version of w3m? Do you just want to display the text on the webpage? – Wayne Werner Nov 14 '11 at 14:52
  • I would like print html contents and display (in terminal) it like webpages not just a simple text. This is a interesting, or important thing if you are using crawlers. Treat all htmls tags can be anoying,or impossible but if python interprets it everything can become more easy. – GarouDan Nov 15 '11 at 01:31

3 Answers3

2

I had success doing a python program a.py as below:

a='<html><h1>My example text</h1></html>'
file=open("a.html","w")
file.write(a+'\n')
file.close

and then doing another shell script a.sh something like:

#!/bin/sh
/usr/bin/env python a.py
w3m a.html

But I think it isn't a good way, isn't there one that only uses python?

GarouDan
  • 3,743
  • 9
  • 49
  • 75
1

I hope someone can give you a better answer, but I'm going to tell you my idea anyway: you can use html2text -I think it's a Python script- or, html2pdf and then pdf2text. And finally print the generated text of course. Hope it helps.

martincho
  • 4,517
  • 7
  • 32
  • 42
  • Hum...I would like print the content of the variable with the web/html formats or highlights...for example blue to links and other things. I wouldn't like change to plain text... – GarouDan Nov 10 '11 at 04:15
  • I edited the question. I think it's more precise now. See my print. Thx for help. – GarouDan Nov 10 '11 at 14:24
1

to trim the tags from the above example I used:

    >>> a='<html><\p>My example text<p></html>'
    >>> while '<' in a or '>' in a:
    ...     a = a.replace(a[a.find('<'):a.find('>')+1],"")
    ... 
    >>> a
    'My example text'

That should work unless the text you want to extract contains '<' or '>', or if the variable is invalid html.

colton7909
  • 824
  • 12
  • 16
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – teambob Nov 10 '11 at 03:27
  • Hum...but I don't want change the html, I would like to keep it, but show it as a html. Until know I had printed the variable to a file and then called this file if the `w3m`, but maybe there's a better solution. – GarouDan Nov 10 '11 at 04:12
  • @colton7909 I edited the question. I think it's more precise now. See my print. Thx for help. – GarouDan Nov 10 '11 at 14:24
  • 1
    Oh my apologies. That is a much bigger question than I originally thought. I'm not sure I can help you with that, but I did find this (http://docs.python.org/library/htmlparser.html) library in the documentation – colton7909 Nov 15 '11 at 03:53