This is a bit hard for me to describe:
I needed a function for Powershell to convert a PDF to TXT. I found it and can use it to generate a .txt
Ended up using this:
[CmdletBinding()]
Param(
[Parameter(Mandatory, Position = 0, ValueFromPipeline)]
[ValidateScript({ Test-Path $_ })]
[string]
$Path
)
begin {
if (-not ([System.Management.Automation.PSTypeName]'iTextSharp.Text.Pdf.PdfReader').Type) {
Add-Type -Path "$PSScriptRoot\itextsharp.dll"
}
}
process {
$Reader = New-Object 'iTextSharp.Text.Pdf.PdfReader' -ArgumentList $Path
$PdfText = New-Object 'System.Text.StringBuilder'
for ($Page = 1; $Page -le $Reader.NumberOfPages; $Page++) {
$Strategy = New-Object 'iTextSharp.Text.Pdf.Parser.SimpleTextExtractionStrategy'
$CurrentText = [iTextSharp.Text.Pdf.Parser.PdfTextExtractor]::GetTextFromPage($Reader, $Page, $Strategy)
$PdfText.AppendLine([System.Text.Encoding]::UTF8.GetString([System.Text.ASCIIEncoding]::Convert([System.Text.Encoding]::Default, [System.Text.Encoding]::UTF8, [System.Text.Encoding]::Default.GetBytes($CurrentText))))
}
$Reader.Close()
$PdfText.ToString()
}
}
Import-PDFText -Path F:\Documents\testpdf.pdf | out-file f:\Documents\test.txt
This works just fine for me.
The test.txt file now contains for example:
Fixed word or sentence
variable/random word or sentence
Fixed word or sentence
variable/random word or sentence
Fixed word or sentence
variable/random word or sentence
etc.
Now comes the tricky part for me.
What i need is a method to "grep" a few words UNDER some of the "fixed words or sentences" in this text.
Because the only "reference point" for me are the static words in this text, but i really need the words under them
So something like:
get-content f:\documents\test.txt | Where-Object -eq "the fixed word/sentence" + "the string
under them"
(and only that so discard the fixed string)'
Can anybody get me started on this? Much appreciated.
AdminOfThing got me started with:
$out = Select-String -Path file.txt -Pattern 'Fixed word' -SimpleMatch -
Context 0,1
$out.Context.PostContext
This works like a charm.
However i just found out that there are a few exceptions in my text.
In some cases i need the text or sentence under 2 occurring fixed lines (because sometimes the first and /or second fixed lines in my text are reoccurring but NEVER in combination. So:
fixed word or sentence
another fixed word or sentence
random word or sentence
fixed word or sentence+another fixed word or sentence combined=unique
info
What i'm trying is use both of the sentences or words to give me the random word or sentence under those.