Questions tagged [lxml.html]

lxml.html is a dedicated python package for dealing with HTML.

lxml.html is a dedicated python package for dealing with HTML. It is based on lxml's HTML parser, but provides a special Element API for HTML elements, as well as a number of utilities for common HTML processing tasks.

159 questions

votes

1 answer

How can I preserve
as newlines with lxml.html text_content() or equivalent?

I want to preserve
tags as \n when extracting the text content from lxml elements. Example code: fragment = '

This is a text node.
This is another text node.

And a child element.Another child,
with two…

python lxml lxml.html

asked Sep 06 '13 at 14:39

extempo

votes

2 answers

Extending CSS selectors in BeautifulSoup

The Question: BeautifulSoup provides a very limited support for CSS selectors. For instance, the only supported pseudo-class is nth-of-type and it can only accept numerical values - arguments like even or odd are not allowed. Is it possible to…

python css-selectors beautifulsoup html-parsing lxml.html

asked Dec 21 '15 at 03:59

alecxe

462,703
120
1,088
1,195

votes

3 answers

Type hints for lxml?

New to Python and come from a statically typed language background. I want type hints for https://lxml.de just for ease of development (mypy flagging issues and suggesting methods would be nice!) To my knowledge, this is a python 2.0 module and…

python-3.x types lxml mypy lxml.html

asked Aug 05 '20 at 06:10

Ian

votes

4 answers

How to use Cleaner, lxml.html without returning div tag?

I have this code: evil = "bold textitalic text" cleaner = Cleaner(remove_unknown_tags=False, allow_tags=['p', 'br', 'b'], page_structure=True) print cleaner.clean_html(evil) I expected…

python lxml.html

asked Jan 29 '14 at 02:28

Allan Veloso

5,823
1
38
36

votes

1 answer

How to preserve inline CSS style with lxml.html.clean.Cleaner() in Python?

I am trying to clean up an HTML table using lxml.html.clean.Cleaner(). I need to strip JavaScript attributes, but would like to preserve inline CSS style. I thought style=False is the default setup: import lxml.html.clean cleaner =…

python lxml lxml.html

asked Dec 03 '13 at 05:55

laviex

votes

1 answer

Python Print element from lxml html

Trying to print out the entire element retrieved from lxml. from lxml import html import requests page=requests.get("http://finance.yahoo.com/q?s=INTC") qtree = html.fromstring(page.content) quote =…

python lxml.html

asked Feb 02 '16 at 02:08

Kevin

votes

2 answers

How to fix issue with the removed cssselect package in lxml?

So they removed the cssselect package from lxml.. Now my python program is useless. I just can't figure out how I could get it working: ImportError: cssselect seems not to be installed. See http://packages.python.org/cssselect/ I've tried to copy…

python xpath lxml pypi lxml.html

asked Apr 22 '14 at 13:29

kamilla

votes

1 answer

Python Xpath: lxml.etree.XPathEvalError: Invalid predicate

I'm trying to learn how to scrape web pages and in the tutorial I'm using the code below is throwing this error: lxml.etree.XPathEvalError: Invalid predicate The website I'm querying is (don't judge me, it was the one used in the training vid :/ ):…

python xpath web-scraping python-requests lxml.html

asked Apr 06 '16 at 11:09

Michael Martinez

votes

2 answers

Why am I getting this ImportError?

I have a tkinter app that I am compiling to an .exe via py2exe. In the setup file, I have set it to include lxml, urllib, lxml.html, ast, and math. When I run python setup.py py2exe in a CMD console, it compiles fine. I then go to the dist folder It…

python lxml py2exe importerror lxml.html

asked Mar 02 '14 at 20:51

Zach Gates

votes

1 answer

How to rename a node with Python LXML?

How do I rename a node using LXML? Specifically, how to rename a parent node i.e. a tag while preserving all the underlying structure? I am parsing using the lxml.html module but supposedly there shouldn't be any difference between xml and…

python xml lxml lxml.html

asked Apr 06 '16 at 18:03

ccpizza

28,968
18
162
169

votes

1 answer

printing html entities using lxml in python

I'm trying to make a div element from the below string with html entities. Since my string contains html entities, & reserved char in the html entity is being escaped as & in the output. Thus html entities are displayed as plain text. How can I…

python html html-parsing lxml lxml.html

asked Dec 07 '14 at 05:59

ravi

votes

1 answer

lxml.html. Error reading file; Failed to load external entity

I am trying to get a movie trailer url from YouTube using parsing with lxml.html: from lxml import html import lxml.html from lxml.etree import XPath def get_youtube_trailer(selected_movie): # Create the url for the YouTube query in order to find…

parsing lxml lxml.html

asked Apr 02 '15 at 20:13

alekscp

votes

1 answer

href attribute for lxml.html

according to this answer: >>> from lxml.html import fromstring >>> s = """""" >>> doc = fromstring(s) >>> doc.value '1234' >>> doc.name 'question' I tried to get both the link and the text from this…

python-3.4 lxml.html

asked Dec 07 '14 at 16:54

nazmus saif

votes

2 answers

How to remove insignificant whitespace in lxml.html?

I'm rather surprised that lxml.html leaves insignificant whitespace when parsing HTML by default. I'm also surprised that I can't find any obvious way to make it not do that. Python 2.7.3 (default, Apr 10 2013, 06:20:15) [GCC 4.6.3] on linux2 Type…

python html-parsing lxml.html

asked Aug 29 '13 at 04:40

Mark E. Haase

25,965
11
66
72

votes

1 answer

parse html body fragment in lxml

I'm trying to parse a fragment of html:

title

I use lxml.html.fromstring. And it is driving me insane because it keeps stripping the tag of my fragments: >…

python html lxml lxml.html pyquery

asked May 11 '13 at 15:35

fserb

4,004
2
26
23

2 3

…

10 11 Next