1

I have tried every command found in the documentation, how could i get only the text part as output, and not at all the images?

https://github.com/coolwanglu/pdf2htmlEX/wiki/Command-Line-Options.

1 Answers1

0

I'm not sure what you are trying to achieve as the question subject and details appears contradictory, but there are options to split out the graphics and text into separate files:

--embed <string>
   --embed-css <0|1> (Default: 1)
   --embed-font <0|1> (Default: 1)
   --embed-image <0|1> (Default: 1)
   --embed-javascript <0|1> (Default: 1)
   --embed-outline <0|1> (Default: 1)
          Specify which elements should be embedded into the  output  HTML
          file.

          If  switched  off,  separated files will be generated along with
          the HTML file for the corresponding elements.

          --embed accepts a string as argument. Each letter of the  string
          must  be  one  of  `cCfFiIjJoO`, which corresponds to one of the
          --embed-*** switches. Lower case letters for 0  and  upper  case
          letters  for  1.  For  example,  `--embed  cFIJo` means to embed
          everything but CSS files and outlines.

   --split-pages <0|1> (Default: 0)
          If turned on, the content of each page is stored in a  separated
          file.

          This  switch is useful if you want pages to be loaded separately
          & dynamically -- a supporting server might be necessary.

          Also see --page-filename.

So if you use the --split-pages 1 and --embed-image 0 options, then you have one HTML page per PDF page, which does not include embedded images.

If this isn't what you want then please include additional information in your question.

David Hedley
  • 344
  • 3
  • 10