2

I'm trying to use a PySpark notebook in Microsoft Azure Synapse to convert an HTML string to a pdf. I have found multiple library’s such as "weasyprint", "wkhtmltopdf", "wkhtml2pdf", and "pdfkit" that work in python but aren't available in PySpark.

Does anyone know how I can accomplish this?

example code:

    <h2> why cant i get this to work </h2>
    <p> I am not entirely sure this is possible to do in PySpark</p>
    
    <table>
      <tr>
        <th> test1 </th>
        <th> test2 </th>
     </tr>
    
      <tr>
        <td>30</td>
        <td>42</td>
      </tr>

Sweta Jain
  • 3,248
  • 6
  • 30
  • 50
Reece
  • 21
  • 1
  • I think you can add those libraries in the SparkContext and then create a UDF that will be convert HTML to PDF using specified library. Take a look at `pyspark.SparkContext.addPyFile` – vladsiv Sep 04 '22 at 20:33

0 Answers0