I'm having a really hard time with this one,
EDIT: I'm putting this edit at the top: if any one want to read the problem and more, you are very welcome, I kind of starting to solve is really hard issue, but getting into a new problem, the way I thought of is to just return all the long HTML page divided by the paragraphs ("p" tags). Up to here every thing is working and when i do assert False, i am getting every thing as i want it. then in the template i go over the list I've sent in the response and for each value (a paragraph) for now i am creating a div (a page in the book), here is the problem. I am getting every paragraph three times! code below...
assert (part of it):
<p style="text-align: center;">
<span style="font-size:24px;"><strong><u>The Ten Foot Stop</u></strong></span></p>,
<p style="margin-bottom: 0.2in; text-align: center;">
<span style="font-size:18px;"><font style="font-size: 7pt;">NEWS AND OCCASIONAL ITEMS
ABOUT THE MEDICAL ASPECTS OF SCUBA DIVING.<br />
POSTED BY ERN CAMPBELL, MD</font></span></p>
template:
{% for article_page in article_pages %}
{% if article_page %} <!-- don't show an empty paragraph -->
{{ article_page|safe }}
{% endif %}
{% endfor %}
show this in page:
[The Ten Foot Stop, The Ten Foot Stop, The Ten Foot Stop]
<!-- first paragraph has: The Ten Foot Stop -->
from here is my original posts with all the issue description: I have a very long HTML like string (no head or body and stuff, but has tags and style, img tags and every thing else in it) and i need to split the string to smaller strings by number of words (need the string to fit into divs of certain sizes - lets say every 165 words more or less or even better to fit to certain height do it will fit the dive size- but i think that the second is much more complicated).
The problem i am having and tried every thing, including BeautifulSoup and other methods, is that i can't find a way to split the string while keeping the tags safe.... if i have a style tag for example, and the stag starts at the 160 char and go to the 170 char, the second page (div) will treat the styles as a regular string and BeautifulSoup only close "bad" tags as i saw, doesn't open the tags for the "bad" text in the second/third and so on divs....
And thought about using the truncate_html_words from text.py, but as the name implied, this only truncate words, doesn't save the rest of the text for the next page (or am i wrong)?
Any one has an idea about how to do this?
OK, Starting to figure this out slowly, i will publish it when it is done, i think people need this kind of thing. Next step is, I broke the html string by tags (in my case every HTML "p" tag. now how do i count the text and only the text in the tag? (ps. the tag might have child tags that wrap the text and might have multiple child tags also eg:
10x, Erez