10

I have created an application using JodConverter and Open-Office for converting an excel(.xlsx) to PDF, The application works fine but i am facing two problems

  1. The pages of output PDF is in the form of A4 size, since because of that certain worksheet content have been sliced off. since i want each worksheet of the excel as complete as in one page what ever the size.

  2. The no of worksheets were missing, say if my excel has 8 worksheet i am getting only two or three within the PDF output

Even if we tried to convert to pdf directly from open-office, its giving the above similar issues

Excel File - ss1.xlsx

Output PDF - work.pdf

can anyone please tell me some solution for this

My code is as given below

public class MyConverter {

    public static void main(String[] args) throws ConnectException {
        File inputFile = new File("C:/Users/Work/Desktop/ss1.xlsx");
        File outputFile = new File("C:/Users/Work/Desktop/work.pdf");

        // connect to an OpenOffice.org instance running on port 8100
        OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);
        connection.connect();

        // convert
        DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
        converter.convert(inputFile, outputFile);

        // close the connection
        connection.disconnect();
    }
Alex Man
  • 4,746
  • 17
  • 93
  • 178

4 Answers4

2

I used the (free) PrimoPDF printer driver to create the PDF directly from within Excel. The large number of pages (20+) is due to the fact that the 'fit to page' print option is missing in one of the worksheets, the 3rd if i remember well. After correcting this, the command to print all worksheets still results in 2 PDF files to be generated. PrimoPDF asks twice for a file name while it should ask only one. I assume your program just generates the PDF corresponding to the first part since normally only one PDF should be generated. I have no explanation for the 2-part printing. It might be due to some print setting in one of the worksheets that forces the printing to be performed in 'two batches'. For instance, a different resolution value may prevent printing in one batch. Conclusion: the workaround is to print with PrimoPDF and concatenate the 2 PDF files using one of the freely available programs on the web. For a durable solution you'll have to verify the print settings of all worksheets in detail and make sure they are equal.

Paul1n
  • 84
  • 6
  • I'm afraid the code shown in the question can't be modified to solve the problem. The problem lies in the xlsx file. I checked the page settings and found that in the last sheet (Sample Test 3) the Print Quality is left blank whereas all other sheets specify 600 DPI. – Paul1n Jul 29 '14 at 08:57
  • (continued comment) So I changed the last sheet to 600 DPI and lo and behold, PrimoPDF generates just one PDF containing all sheets. I'm quite sure this change will also solve the problem with the program. – Paul1n Jul 29 '14 at 09:06
2

I'm afraid my previous answer was not clear enough. So here is the essence:

  • All your worksheets except the 3rd have a resolution setting of 600. In the 3rd sheet the resolution is left blank
  • Change the resolution of the 3rd sheet to 600
  • The PDF file will now be generated normally, containing all sheets.

Apparently Excel can only produce PDFs where the page resolutions of all worksheets are the same. If it encounters a sheet with different (of blank) resolution it just stops producing output without giving any warning. IMHO this is a bug in Excel. Fortunately the workaround is easy.

Hope this clarifies my previous answer.

Paul1n
  • 84
  • 6
  • so do i need to use PrimoPDF printer driver – Alex Man Aug 23 '14 at 11:48
  • No, your program should work fine once you've set the resolution ('print quality' in page setup dialog) to 600 in the 3rd worksheet (tab page) of your Excel file. The point is that all worksheets must have the same resolution setting, otherwise Excel will not be able to produce a single PDF file. – Paul1n Aug 24 '14 at 14:49
  • why the output PDF is in the form of A4 size – Alex Man Aug 25 '14 at 03:52
  • The page setup setttings are not the same for all worksheets. Here are the important settings i found on the Excel file ss1.xlsx I just downloaded on Dropbox: – Paul1n Aug 25 '14 at 11:06
  • (cont'd from previous comment) -- Sample Test 0: scale=fit to page, paper size Letter -- Sample Test 1: scale=fit to page, paper size Letter -- Sample Test 2: scale 100%, paper size A4 -- Sample Test 3: scale 100%, paper size A4 As you see the scaling and paper size are not the same for all sheets. Make all 4 page setups equal, e.g. scaling = fit to page and paper size = A4. I think this will produce an ok pdf file containing all sheets and no data clipped off. – Paul1n Aug 25 '14 at 11:27
  • It's a quick manual job to make sure all page settings are equal. Of course automating is useful if you expect to receive a lot of files like this one. – Paul1n Aug 26 '14 at 10:16
  • since the excel files are directly imported to the application for converting it into pdf, so we can't do those manual settings within the excels, rather it should be done programmetically – Alex Man Aug 26 '14 at 12:11
  • Hey check my answer out, you can change all kinds of settings by following the Reference Models of Excel http://msdn.microsoft.com/en-us/library/office/ff194068(v=office.15).aspx. Also by using Jacob it makes it easier, was the generated PDF in my answer what you were trying to achieve? – PalinDrome555 Aug 26 '14 at 12:18
  • Afraid it would not be very useful on Linux or Mac, it uses activeX Components to connect to `.dll` files, but it does work very well on windows. – PalinDrome555 Aug 27 '14 at 15:14
  • @user3884212 i changed the scaling to fit to page and paper size =A4, but still the open-office is generating PDF with two pages – Alex Man Aug 28 '14 at 06:14
  • @Alex Man I left .PrintQuality = 300 which corresponds to a blank entry. With this setting, PrimoPDF generated 2 files as well as your program. Apparently the value must be a valid – Paul1n Aug 29 '14 at 14:11
  • @Alex Man (cont'd) quality must be a valid DPI value. I changed .PrintQuality to 300 and now i'm getting a single PDF file! I think the rationale is as follows: in PDF, page settings are document global. This means that a document that contains 2 different resolutions **must** be generated as 2 files (= as 2 print jobs). BTW PDF **does** support multiple page sizes -> may be different if my reasoning ok! Hope the DPI fix will help, cross fingers... – Paul1n Aug 29 '14 at 14:40
  • @alex Man The excel file is at [dropbox](https://www.dropbox.com/s/8uuc78v1f2bb52l/ss1%20with%20macro.xlsm?dl=0) Some remarks: --1-- The main macro that performs page setup for all worksheets is *PageSetup_AllSheets* --2-- You don't need primoPDF. I ran the macro with default printer set to my Canon --3-- Not all printers support explicit dpi values (such as 600) as printQuality. For instance my Canon won't. There is a second method to specify print quality using negative numbers. Values are: -4=high -3=medium -2=low. I used -3 (medium) in my macro. – Paul1n Aug 30 '14 at 21:22
  • The Excel file i placed on Dropbox was not the correct one. Here is the [UPDATE](https://www.dropbox.com/s/v9yvp6p6wevijqg/ss1%20with%20macro%20%2831%20Aug%29.xls?dl=0) Unfortunately in the PDF output from PrimoPDF, the 3rd page is chopped off at the right, probably in the same way as reported by @PalinDrome555. A way to create multiline text that is not aligned with rows and avoids wrapping/merging is to create the text in a separate range and then using *paste as image* to the target location. For details see [this Excel tip](http://j-walk.com/ss/excel/usertips/tip039.htm) – Paul1n Aug 31 '14 at 09:31
  • @Alex Man Please can you clarify to which xls setting you are referring? (all changes done using Excel, didn't use Word) – Paul1n Sep 09 '14 at 07:40
  • What I meant is that whether you done those print settings ie 300 dpi and the page size to A4 using microsoft excel....since when I open the particular xls within openoffice I have got a message saying that the file contains macros.....also you have changed xlsx to xls – Alex Man Sep 09 '14 at 10:09
  • @Alex Man The print settings are indeed done using VBA code (=Excel macros). Unfortunately, the OpenOffice implementation of VBA is far from compatible with Excel. The change from .xlsx to .xls is because I used an older Excel version (Excel 2003). It's compatible except for some new features. Excel versions 2010 and newer produce .xlsx (xls extended) files by default. IMO MS Excel is still the best solution for automation (macros). – Paul1n Sep 10 '14 at 11:46
1

You can use Microsoft Excel to do any .xlsx conversions to PDF, I am currently developing an application that uses Jacob, (Java Com Bridge) a thin wrapper for connecting to the Object Models of Microsoft Office Programs, afaik it doesn't support open-office, but it does a good job at converting your .xlsx file to a PDF file. It requires a bit of setting up. Link to Jacob

When looking into the problem I found in Excel -> Page Setup, if you change Fit to 1 pages wide and 1 pages tall, it squeezes each work sheet to fit on each page in PDF. Another problem I ran into was the wrap text property, be careful when you use it, as it can cause layout issues.

I made a small implementation of this, tested on Excel 2010 and Jacob 1.18

import com.jacob.activeX.ActiveXComponent;
import com.jacob.com.ComFailException;
import com.jacob.com.Dispatch;
import com.jacob.com.Variant;

public class ExcelApplication {

    private final String APP_NAME = "Excel.Application";

    private final ActiveXComponent excelApplication;

    private Dispatch workbooks;//all active workbooks opened

    private Dispatch activeWorkbook;//active workbook

    private Dispatch activeWorksheets;//all worksheets in active workbook

    public ExcelApplication() {
        excelApplication = new ActiveXComponent(APP_NAME);
    }

    public void openExcelFileInvisible(String fileName) {
        //Opens Excel in the background
        String fileUrl;
        if (excelApplication != null) {
            excelApplication.setProperty("Visible", new Variant(false));//sets excel invisible            
            //file url relative to this class
            //or you can just give an absolute path
            fileUrl = getClass().getResource(fileName).toExternalForm();
            //get workbooks
            workbooks = Dispatch.call(excelApplication, "Workbooks").getDispatch();
            if (activeWorkbook == null) {
                try {
                    activeWorkbook = Dispatch.call(workbooks, "Open", fileUrl).getDispatch();
                } catch (ComFailException comFailEx) {
                    //error opening the Excel Document
                }
            }
        }
    }

    public void closeActiveWorkbookAndSave() {
        try {
            //close and save change's to active workbook
            //this only closes the workbook, not Excel
            Dispatch.call(activeWorkbook, "Close", new Variant(true));
            //if you want to exit the Excel App.            
            //excelApplication.invoke("Quit", new Variant[0]);            
        } catch (ComFailException cfe) {
            //problem closing the workbook
        }
    }

    public void convert_XLSX_TO_PDF(String pdfFileName) {
        if (activeWorkbook != null) {
            String workbookName = Dispatch.call(activeWorkbook, "Name").getString();
            activeWorksheets = Dispatch.call(activeWorkbook, "Worksheets").getDispatch();
            int workSheetCount = Dispatch.call(activeWorksheets, "Count").getInt();
            System.out.println("Workbook Name =" + workbookName);
            System.out.println("Total Worksheets In Active Document = " + workSheetCount);
            System.out.println("Converting to PDF....");
            try {                
                Dispatch currentWorksheet;
                String currentWorksheetName;
                //worksheets not zero based, starts at one
                for (int i = 1; i < workSheetCount+1; i++) {
                    //get each active work sheet and set up the page setup settings
                    currentWorksheet = Dispatch.call(activeWorksheets, "Item", new Variant(i)).getDispatch();
                    currentWorksheetName = Dispatch.call(currentWorksheet, "Name").getString();
                    System.out.println("Setting up page setup for Workbook Sheet ("+ i + ".) - " + currentWorksheetName);
                    //Get page setup for each worksheet
                    Dispatch pageSetup = Dispatch.get(currentWorksheet, "PageSetup").getDispatch();
                    /**** Zoom must be set to false for FitToPagesWide and FitToPagesTall
                       to take control of scaling
                    */
                    Dispatch.put(pageSetup, "Zoom", new Variant(false));                    
                    //Fit content on each worksheet to fit in a single page                                        
                    Dispatch.put(pageSetup, "FitToPagesWide", new Variant(1));
                    Dispatch.put(pageSetup, "FitToPagesTall", new Variant(1));                    
                    //set print area to not chop off content
                    Dispatch.put(pageSetup, "PrintArea", new Variant(false));                    
                    //set left margin small
                    Dispatch.put(pageSetup, "LeftMargin", new Variant(0));                                     
                }
                //[3rd param] = 0 specifies PDF document, 1 is XPS format
                //[4th param] = 0 specifies high quality, 1 is low quality
                //[5th param] = true to keep document properties, false to ommit
                //[6th param] = true to keep print areas set, false does not keep print areas set 
                Dispatch.call(activeWorkbook, "ExportAsFixedFormat", new Variant(0), new Variant(pdfFileName), new Variant(0), new Variant(false), new Variant(true));
                System.out.println("Export to PDF has been successful.");
                //close and save
                closeActiveWorkbookAndSave();
            } catch (ComFailException comFailEx) {
                //Export Failed
                System.out.println("Export to PDF has failed");
            }
        }
    }

}

public class TestExcel {

    public static void main(String[] args) {
        // TODO code application logic here
        ExcelApplication e = new ExcelApplication();
        e.openExcelFileInvisible("ss1.xlsx");
        //full path accepted here or if not it will be exported to current directory
        e.convert_XLSX_TO_PDF("covertedXLSXFile.pdf");
    }

}

Here is the PDF File generated from the above code. Notice the 3rd page, the content is a bit chopped off, when you remove the wrap text property and merged cells, it generates fine. Converted XLSX

PalinDrome555
  • 907
  • 6
  • 16
0

This is the Excel VBA code to set up the same page parameters for all worksheets. Sorry i'm not familiar with Openoffice programming, assume that the API is similar:

Sub PageSetup_AllSheets()
    Dim ws As Worksheet
    For Each ws In ActiveWorkbook.Worksheets
        ws.Activate
        Setup_Page
    Next
End Sub

Sub Setup_Page()
'
' Setup_Page Macro
' Macro recorded 27/08/2014 by Paul
'
    With ActiveSheet.PageSetup
        .PrintTitleRows = ""
        .PrintTitleColumns = ""
        .LeftHeader = ""
        .CenterHeader = ""
        .RightHeader = ""
        .LeftFooter = ""
        .CenterFooter = ""
        .RightFooter = ""
        .LeftMargin = Application.InchesToPoints(0.7)
        .RightMargin = Application.InchesToPoints(0.7)
        .TopMargin = Application.InchesToPoints(0.75)
        .BottomMargin = Application.InchesToPoints(0.75)
        .HeaderMargin = Application.InchesToPoints(0.3)
        .FooterMargin = Application.InchesToPoints(0.3)
        .PrintHeadings = False
        .PrintGridlines = False
        .PrintComments = xlPrintNoComments
        .PrintQuality = -3
        .CenterHorizontally = False
        .CenterVertically = False
        .Orientation = xlPortrait
        .Draft = False
        .PaperSize = xlPaperA4
        .FirstPageNumber = xlAutomatic
        .Order = xlDownThenOver
        .BlackAndWhite = False
        .Zoom = False
        .FitToPagesWide = 1
        .FitToPagesTall = 1
        .PrintErrors = xlPrintErrorsDisplayed
    End With
End Sub
Paul1n
  • 84
  • 6