1

I have the following html source loaded in a UIWebView
I want to extract
text1
text2 text2
text3 text3 text3

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>1322170516271</title>
    <meta name="viewport" content="initial-scale=1.0, user-scalable=1, minimum-scale=1.0, maximum-scale=4.0">                   

    <style type="text/css">
    body
    {
        padding: 5px;
        margin: 0px;
        font-family: Helvetica, Arial;
        font-size: 12pt;
        background-color: #efefef;
        background-image: url(ArticleBackground.jpg);
        background-position: cover;
        color: #000000;
    }
    h1
    {
        text-align: center;
        border-bottom: 1px dotted #805050;
        font-size: 28px;
        line-height: 38px;
        margin-bottom: 30px;
        text-shadow: 0 2px 1px white;
        color: #803030;
    }
    </style>

</head>

<body>

    <script type="text/javascript">
    function printMe()
    {
        print();
    }
    </script>

    <div style='align:center; padding: 20px;'>

        <div>

    <b>text1</b><br><br>

    <h2>
      text2 text2
    </h2>
    <br>
    text3 text3 text3

        </div>

    </div>

</body>
</html>

but here is what I get when I use

[webView stringByEvaluatingJavaScriptFromString:@"document.documentElement.textContent"]

I don't need the body and h1. I only want the actual text that is user facing.

234534546



    body
{
    padding: 5px;
    margin: 0px;
    font-family: Helvetica, Arial;
    font-size: 12pt;
    background-color: #efefef;
    background-image: url(ArticleBackground.jpg);
    background-position: cover;
    color: #000000;
}
h1
{
    text-align: center;
    border-bottom: 1px dotted #805050;
    font-size: 28px;
    line-height: 38px;
    margin-bottom: 30px;
    text-shadow: 0 2px 1px white;
    color: #803030;
}







    function printMe()
    {
        print();
    }






text1


  text2 text2


text3 text3 text3

Thanks for any insight.

UPDATE

[webView stringByEvaluatingJavaScriptFromString:@"document.body.innerHTML"] won't work either for my goal

<script type="text/javascript">
    function printMe()
    {
        print();
    }
    </script>

    <div style="align:center; padding: 20px;">

        <div>

    <b>text1</b><br><br>

    <h2>
       text2 text2
    </h2>
    <br>
    text3 text3 text3

        </div>

    </div>

update: this is needed for an existing project. If I had the chance to redesign it, a solution would be easy to find. But given this HTML source as it is, it might make it a bit difficult.

Zsolt
  • 3,648
  • 3
  • 32
  • 47

2 Answers2

1

Try using :

document.body.innerHTML

Or take a look at parsing HTML: parsing HTML on the iPhone There are many other links on SO.

Community
  • 1
  • 1
san
  • 3,350
  • 1
  • 28
  • 40
  • yes, I tried that samjeev, but that is not good either. Will update my description. If possible I don't want to add a parser just for this. This seems something that is doable after loading it in a web view with a simple javascript call. – Zsolt Jun 13 '12 at 07:11
1

why dont you put all your text into different tags such as div,p,etc . give id's to each of them and then get the text within them by the syntax

var text1 = document.getElementById("your ID").innerHTML

hope this works with your problem.

Neji
  • 6,591
  • 5
  • 43
  • 66
  • yes, I wish I could do that, but it is too late. We are talking about a huge number of already existing html files and that would be a costly option. So I have no way of working it out that way. Thanks for the suggestion though. – Zsolt Jun 13 '12 at 07:46