1

I am trying to extract the text from a document and pdf files and put them in a text area.

My code is at follows:

<html>
    <head>
        <title>FileReader Example</title>

        <script src="http://code.jquery.com/jquery-1.9.1.min.js"></script>
        <script src="http://code.jquery.com/mobile/1.4.2/jquery.mobile-1.4.2.min.js"></script>
        <script src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js"></script>

        <script type="text/javascript" charset="utf-8">
            function upload(){
                document.getElementById("image_src").click();
            }

            $("document").ready(function () {
                $("#image_src").change(function () {
                    readBlob();
                });
            });

            function readBlob() {
                var files = document.getElementById('image_src').files;
                if (!files.length) {
                    alert('Please select a file!');
                    return;
                }

                var file = files[0];
                var start = 0;
                var stop = file.size - 1;
                var reader = new FileReader();

                // If we use onloadend, we need to check the readyState.
                reader.onloadend = function (evt) {
                    console.log(evt.target.result);
                    console.log(evt.target.data);
                    if (evt.target.readyState == FileReader.DONE) { // DONE == 2
                        document.getElementById('byte_content').textContent = evt.target.result;
                    }
                };

                var blob = file.slice(start, stop + 1);
                reader.readAsBinaryString(blob);
            }
        </script>

        <style>
            #image_src {
                position:absolute;
                left:-9999px;
            }
            #img {
                cursor:pointer;
            }
        </style>
    </head>
    <body>
        <div class="container">
            <img id="img" src="images/ChooseFile.png" onclick="upload()" alt="hellp"/>
            <input type="file" name="image_src" id="image_src" />
            <pre id="fileDisplayArea"><pre>
            <div id="byte_content"></div>
        </div>
    </body>
</html>

The only problem I am having is that text is being displayed as rubbish but if I upload a text file it works. What's going wrong?

Cerbrus
  • 70,800
  • 18
  • 132
  • 147
keith Spiteri
  • 259
  • 1
  • 6
  • 22

1 Answers1

3

PDF is a binary format , it may contain interactive elements such as annotations, form fields, video and Flash animation.

If you need to work with PDF documents i suggest looking into PDF.js project .

I have located some API Doc's that might help you getting started :

Alexander
  • 12,424
  • 5
  • 59
  • 76