0

I'm trying to convert a plaintext document to PDF. The only method which has come anywhere close to actually working is installing "GhostScript" and then using the following PostScript script, dug up by the SE user @RedGrittyBrick (thanks), which takes a plaintext document (underneath the script) and produces a PDF from it.

It technically works, but visually messes up the top and left margins for each page in such a manner that the top margin becomes "way too much" and the left margin is "a bit too for in" (compared to the right margin). At least when viewed in SumatraPDF, which is the only PDF viewer I have.

The script states:

/topmargin 1 inch def
/leftmargin 1 inch def

However, it visually looks like the top margin is maybe 4 inches and not 1 inch as it says in the file. If I modify it to 0, the finished PDF visually appears to have 1 inch top margin. If I, on the other hand, modify the leftmargin to 0 inch, it goes all the way to the left border.

The way it visually looks right to me, with proper, even margins on top/right/bottom/left, is:

/topmargin 0 inch def
/leftmargin 0.8 inch def

But I can't just keep it like that as it will more than likely break on others' computers/PDF viewers. And even if it doesn't, I still really bugs me that I don't understand what is going on.

I have been told that the reason for this happening is that the PostScript does not specify a "page size". However, I have no idea how I would specify this into the document, nor how it can be possible that the author of the script never did this in the first place. It seems like such a basic, major mistake, yet the person who gave it to me claims to have used it for many years in many different environments successfully, so what does that mean? That SumatraPDF has very exotic default settings? That the person in question has very low standards? That I'm going insane? I really don't know what to make of this, or how to fix it.

I thought the whole point of PDF was to always create a 1:1 copy, with no ambiguity whatsoever in the dimensions and how things will be rendered... Apparently not. This is the script:

%!
%
% From: Jonathan Monsarrat (jgm@cs.brown.edu)
% Subject: PostScript -> ASCII *and* ASCII -> PostScript programs
% Newsgroups: comp.lang.postscript
% Date: 1992-10-01 04:45:38 PST 
%
% "If anyone is interested, here is an interesting program written by
% Professor John Hughes here at Brown University that formats ASCII
% in PostScript without a machine generator of any kind."
%
%%%
%%% Plan:
%%% Start with an empty string.
%%% For each character in the input stream, 
%%%    check to see if it's a carriage return.
%%%    if so, show the current string and reset it to empty
%%%    if not, add it to the current string.

/Courier findfont 10 scalefont setfont  %% Choose a fixed width font
/lineheight 
currentfont /FontBBox get dup      %% bbox bbox
0 2 getinterval    %% bbox {xm ym}
exch     %% {xm ym} bbox
2 2 getinterval    %% {xm ym} {xM yM}
aload pop    %% {xm ym} xM yM
3 2 roll     %% xM yM {xm ym}
aload pop
currentfont /FontMatrix get  %% xM yM xm ym MAT
transform    %% xM yM xm' ym'
4 2 roll
currentfont /FontMatrix get  %% xm' ym' xM yM MAT
transform    %% xm' ym' xM' yM'
exch pop     %% xm' ym' yM'
sub     %% xm' ym'-yM'
exch pop    %% dy
neg def 

lineheight pstack pop

/str 500 string def   %% Room to store a long string...
/empty 500 string def   %% An empty string to work with
/stringindex 0 def   %% How far we've filled the string
/inch {72 mul } def   %% A useful tool...
/pageheight 11 inch def
/topmargin 1 inch def
/botmargin 1 inch def
/leftmargin 1 inch def
/linesperpage pageheight topmargin sub botmargin sub lineheight div cvi def
/linenumber 1 def   %% the line we're about to write on

/newline {   %% move to a new line; flush page if necessary
   linenumber linesperpage gt {/linenumber 1 def showpage } if
   leftmargin pageheight topmargin sub linenumber lineheight mul sub moveto
   /linenumber linenumber 1 add def
} def

/cleanup {  %% print out the last bit of whatever you had there...
   str show showpage
} def

/startstring {  %% empty the string and reset its counter.
   str 0 empty putinterval
   /stringindex 0 def
} def

/showstring {  %% print the string on a new line and flush it
   newline
   str show 
   startstring
} def

pstack 

/addtostring {  %% put another character in the string, if there's room
   dup 500 gt {pop}{str exch stringindex exch put
   /stringindex stringindex 1 add def} ifelse
} def

%
% Main program: get characters and deal with them
%
{
   currentfile read {}{cleanup exit} ifelse
   dup 10 eq                   %% if it's a carriage return...
      {pop showstring}         %% write out this line of text and start over
      {dup 0 eq         %% if it's an end-of-file mark...
       {exit}                %% stop!
       {addtostring}           %% otherwise, add the character to current string
       ifelse}
      ifelse                   %% Sample data follows.
} loop

I then run:

ps2pdf in.ps out.pdf

3 Answers3

1

Since you have ghostscript and want to do this automatically:

#!/bin/sh
exec gs -q -sDEVICE=pdfwrite -sPAPERSIZE=letter -dNOSAFER -dNOPAUSE -sOutputFile=$1.pdf -sPROGNAME=$0 -- gslp.ps --heading-center "`date`" "$@"

See the gslp man page for some slightly applicable help.

EDIT: this also works withoug using -dNOSAFER for gs 9.50 and later when converting only one text file:

#!/bin/sh
exec gs -q -sDEVICE=pdfwrite -sPAPERSIZE=letter --permit-file-read="$1" -dNOPAUSE -sOutputFile=$1.pdf -sPROGNAME=$0 -- gslp.ps --heading-center "`date`" "$1"
beginner6789
  • 575
  • 3
  • 7
0

The easy way to turn a 'plaintext document' into PDF is to open the document in your favourite text editor and then 'save as PDF' or 'Print to PDF' from there. It's far more reliable than trying to use an ancient PostScript program which (as is clearly demonstrated by the fact it doesn't work for you) is lacking in features. Recent versions of Linux, Windows and Mac all have this capability and avoids the kind of problems you are seeing.

Instead of assuming the media size to be 11 inches, the program should interrogate the interpreter to discover the current media size and use that. Or as I replied to your earlier question here the program should request a given media size from the interpreter. As I said previously, you need to add something like:

<<
  /PageSize [612 792]
>> setpagedevice

Where the numbers in the array, delimited by '[]' are the requested width and height, in points (1/72 inch). Obviously you need to put that in the program somewhere before the main loop. The setpagedevice operator initilises the graphics state and erases the page, so make sure you do that before drawing anything.

The request above is, obviously, for US Letter media which is 11 inches long, as your program expects.

You keep on stating that PDF is supposed to avoid ambiguity and yes, it does, because a PDF file has a media size in it. But what you have here is not a PDF file, it's a PostScript program.

The PostScript program need not (and in your case does not) request a media size, it can simply use whatever the interpreter has as a default. For example; printers in the US normally have US Letter, printers in Europe have A4. So when you run your PostScript program it uses whatever default is current. In the US your program would likely result in a PDF file which uses US Letter, in Europe it would probably be A4 and the PDF file you produce by running the program would therefore use A4. I would imagine this is why your experience differs from whoever gave you the program in the first place, your environments differ.

The name /topmargin is not magic, it's just a variable name. I don't know what programming languages you are familiar with, but if I created a local variable called topmargin in C++ I wouldn't expect it to have any effect on my program just because it was called topmergin.

But this is nothing to do with PDF, it's a consequence of running the program in two different environments. Each of the PDF files you create will be consistent, no matter which PDF viewer you choose to use, but if the two files are created with two different media sizes then the two files will look different.

halfer
  • 19,824
  • 17
  • 99
  • 186
KenS
  • 30,202
  • 3
  • 34
  • 51
  • Well, "something like" isn't as specific as I need it to be. As for "open the document in your favourite text editor and then 'save as PDF' or 'Print to PDF'", that is not automatable, nor even manually possible. Notepad++ certainly has no such feature. As for using an ancient PostScript program, you have no idea how many countless hours I have spent trying every imaginable "solution" yet never been able to create a proper PDF with any method. It always breaks in some way or doesn't work at all. I've tried numerous worthless programs that just don't work. –  Jan 29 '20 at 08:34
  • There's plenty of ways to automate opening a text file in an editor and printing it. Which part of my answer isn't specific enough ? I've given you the precise invocation, told you what the numbers mean and explained that you'll need to insert it into your program before it does any drawing. Its hard to see what else I can tell you. Its up to you what numbers you put in there, I can't answer that for you because I don't know what you expect (for instance you haven't supplied a complete example to compare with). The numbers thare are, as I said, for an 11 inch page. – KenS Jan 29 '20 at 13:49
  • The lack of a complete example (ie one with some actual text being drawn) means that I can't simply supply you with a modified program, even if I wanted to. – KenS Jan 29 '20 at 13:50
  • Well, I tried to add the extra code and it was just made part of the text document. I tried to move it to a different part of the PostScript and then it did nothing. I'm not a PostScript expert, but just somebody who wants .txt files with 80-char lines to become beautiful and nice PDFs. A complete example would literally be the script with a newline followed by "abc". –  Jan 29 '20 at 13:53
  • Furthermore, why do we have to rely on an ancient PS file to begin with? Why isn't this some kind of "standard template" which has been updated in recent years? I'm confused about what is so unclear about my request: I want to turn a .txt into PDF in an automated manner. –  Jan 29 '20 at 13:55
  • "There's plenty of ways to automate opening a text file in an editor and printing it." I'm not trying to open a text file in an editor and print it. I'm trying to convert a .txt to PDF. I don't understand why this is apparently so unclear even though I've repeated it a million times by now. –  Jan 29 '20 at 13:55
  • I haven't said that's unclear, the reason I suggested using a print option from an editor is because it sidesteps all the issues you are having. I'm afraid if you don't post what changes you've made, there's little I can do to tell you what you;ve done wrong, but if the command is becoming part of the text document, then you have placed it after the marks on the page, not before. If it did nothing then try changing the values. There is no 'standard template' because **this is a programming language**. If you want to write a program to read a text file and lay it out, then great. – KenS Jan 29 '20 at 18:51
  • 1
    To convert a text file to PDF Google text to PDF and use one of the solutions there. – KenS Jan 29 '20 at 18:52
0

Your Y question is how to update a GS program that was highly advanced in its own era before windows was of age, to work on a modern widows X system.

The expert GS writers/maintainers have tried to advise on that, however today there are ever so simple ways to regress that XY task in windows.

Windows uses NotePad to handle PlainText in such a way that all you need to do is there set the fonts and margins once. Then Automanually it is either right click "Print" OR on the command Line use prining /PT option and then NotePad will format it using any PS driver such as GhostScript pdf writer or more easily MS Print to PS/PDF. Also you should Know that SumatraPDF can read PlainText and has command line printing which can be to Image.PDF.

So there are many ways to PrintScript to get a Text or Image structured PostScripted.PDF enter image description here

I suggest the coding sequence is
a) Use Print Management via GUI or CLI to prepare your preferred custom Form or most simply use the system default A4 or Letter page ratio.

b) Either Duplicate the built in PDF driver or add any other virtual PS/PDF driver and redirect it to a NonPromptPort (You can use the default PromptPort for comparative interactive testing) I set Mine to C:\MyData\PrintOut.PDF

c) Configure NotePad to the desired Page Form, Orientation i.e. Landscape or Portrait and margins, If you keep the previous output run.pdf open in SumatraPDF you can even watch it compile (like in LaTeX) and appear before your very eyes within SumatraPDF, since it does not lock small PDFs

d) Write a 1 liner cmd (or convoluted with error checks), so as to allow for drag and drop or other batched automation, call it TXT2PDF.CMD you can add the third and fourth arguments if necessary but I like to Keep It Stupidly Simple, so set %2 to your redirected port driver.

%SystemRoot%\system32\notepad.exe /pt "%1" "My Print to PDF"
Copy C:\MyData\PrintOut.PDF "%~dpn1.pdf"
SumatraPDF "%~dpn1.pdf"

OR for custom format image based PDF use latest SumatraPDF Pre-Release

SumatraPDF -print-to "My Print to PDF" -print-settings "paperkind=A4L" "%1"

Where the output will be PAI thus non-selectable text, and note that in v3.2 or before you must set

EbookUI [
...
    UseFixedPageUI = True

so as to view / print handle TXT (that is NOT needed for v3.3)

PostScriptum :-)

I forgot to mention if you like your TXT pretty e.g. Justified then format it in WordPad rtf like this enter image description here

drop in on Doc2PDF.cmd

and it Auto opens in SumatraPDF like this enter image description here

NOTE Look closely and see how it reflows ! since the PDF output may not be EXACTLY the same margins as the RTF when it was saved.

K J
  • 8,045
  • 3
  • 14
  • 36