I have a Powershell script that uses iTextSharp to extract text from PDF files. One of the files the script downloads comes in sideways, so it needs to be rotated in order for the script to read it.
Here's my function which reads the PDF. I've tested it and it works:
function Get-PdfText {
[CmdletBinding()]
[OutputType([string])]
param (
[Parameter(Mandatory = $true)]
[string]
$Path
)
try {
$reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $Path
}
catch {
throw
}
$stringBuilder = New-Object System.Text.StringBuilder
for ($page = 1; $page -le $reader.NumberOfPages; $page++) {
$text = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader, $page)
$null = $stringBuilder.AppendLine($text)
}
$reader.Close()
return $stringBuilder.ToString()
}
There is plenty of documentation about how to rotate PDFs in C# and Java, but not Powershell. There's a nice example here, but I don't know how to convert it to Powershell: http://developers.itextpdf.com/question/how-rotate-page-90-degrees
Here's my attempt at converting it:
function RotatePdf90Degrees {
param (
[Parameter(Mandatory = $true)]
[string]
$Path
)
$reader = New-Object iTextSharp.text.pdf.PdfReader -ArgumentList $Path
$n = $reader.NumberOfPages
$page #PdfDictionary
$rotate #PdfNumber
for ($p = 1; $p -le $n; $p++) {
$page = $reader.GetPageN($p);
$rotate = $page.GetAsNumber([iTextSharp.text.pdf.PdfName]::ROTATE);
if ($rotate -eq $null) {
$page.put([iTextSharp.text.pdf.PdfName]::ROTATE, [iTextSharp.text.pdf]::PdfNumber(90));
}
else {
$page.put([iTextSharp.text.pdf.PdfName]::ROTATE, [iTextSharp.text.pdf]::PdfNumber(($rotate.IntValue() + 90) % 360));
}
}
$stamper = New-Object iTextSharp.text.pdf.PdfStamper ($reader, [System.IO.StreamWriter] $Path);
$stamper.Close();
$reader.Close();
}
Something is wrong on the $page.put() lines. I don't know how to feed that function a proper PdfNumber object.
I've been using this documentation: http://developers.itextpdf.com/reference/package/com.itextpdf.text.pdf