Is there in library in Apache PySpark to convert html to pdf?

Asked Sep 04 '22 at 15:16

Active Sep 04 '22 at 15:54

Viewed 96 times

I'm trying to use a PySpark notebook in Microsoft Azure Synapse to convert an HTML string to a pdf. I have found multiple library’s such as "weasyprint", "wkhtmltopdf", "wkhtml2pdf", and "pdfkit" that work in python but aren't available in PySpark.

Does anyone know how I can accomplish this?

example code:

    <h2> why cant i get this to work </h2>
    <p> I am not entirely sure this is possible to do in PySpark</p>
    
    <table>
      <tr>
        <th> test1 </th>
        <th> test2 </th>
     </tr>
    
      <tr>
        <td>30</td>
        <td>42</td>
      </tr>

edited Sep 04 '22 at 15:54

Sweta Jain

3,248
6
30
50

asked Sep 04 '22 at 15:16

Reece

I think you can add those libraries in the SparkContext and then create a UDF that will be convert HTML to PDF using specified library. Take a look at `pyspark.SparkContext.addPyFile` – vladsiv Sep 04 '22 at 20:33

Is there in library in Apache PySpark to convert html to pdf?

0 Answers0