0

This is a bit hard for me to describe:

I needed a function for Powershell to convert a PDF to TXT. I found it and can use it to generate a .txt

Ended up using this:

    [CmdletBinding()]
Param(
    [Parameter(Mandatory, Position = 0, ValueFromPipeline)]
    [ValidateScript({ Test-Path $_ })]
    [string]
    $Path
)
begin {
    if (-not ([System.Management.Automation.PSTypeName]'iTextSharp.Text.Pdf.PdfReader').Type) {
        Add-Type -Path "$PSScriptRoot\itextsharp.dll"
    }
}
process {
    $Reader = New-Object 'iTextSharp.Text.Pdf.PdfReader' -ArgumentList $Path
    $PdfText = New-Object 'System.Text.StringBuilder'

    for ($Page = 1; $Page -le $Reader.NumberOfPages; $Page++) {
        $Strategy = New-Object 'iTextSharp.Text.Pdf.Parser.SimpleTextExtractionStrategy'
        $CurrentText = [iTextSharp.Text.Pdf.Parser.PdfTextExtractor]::GetTextFromPage($Reader, $Page, $Strategy)
        $PdfText.AppendLine([System.Text.Encoding]::UTF8.GetString([System.Text.ASCIIEncoding]::Convert([System.Text.Encoding]::Default, [System.Text.Encoding]::UTF8, [System.Text.Encoding]::Default.GetBytes($CurrentText))))
    }
    $Reader.Close()

    $PdfText.ToString()
    
}
}

Import-PDFText -Path F:\Documents\testpdf.pdf | out-file f:\Documents\test.txt

This works just fine for me.

The test.txt file now contains for example:

Fixed word or sentence
variable/random word or sentence
Fixed word or sentence
variable/random word or sentence
Fixed word or sentence
variable/random word or sentence

etc.

Now comes the tricky part for me.

What i need is a method to "grep" a few words UNDER some of the "fixed words or sentences" in this text.

Because the only "reference point" for me are the static words in this text, but i really need the words under them

So something like:

    get-content f:\documents\test.txt | Where-Object -eq "the fixed word/sentence" + "the string 
    under them"
    (and only that so discard the fixed string)'

Can anybody get me started on this? Much appreciated.

AdminOfThing got me started with:

    $out = Select-String -Path file.txt -Pattern 'Fixed word' -SimpleMatch -
    Context 0,1
    $out.Context.PostContext

This works like a charm.

However i just found out that there are a few exceptions in my text.

In some cases i need the text or sentence under 2 occurring fixed lines (because sometimes the first and /or second fixed lines in my text are reoccurring but NEVER in combination. So:

    fixed word or sentence
    another fixed word or sentence 
    random word or sentence

    fixed word or sentence+another fixed word or sentence combined=unique 
    info

What i'm trying is use both of the sentences or words to give me the random word or sentence under those.

  • Regex lookaround solutions here- https://stackoverflow.com/questions/37526216/select-the-next-line-after-match-regex – OwlsSleeping Jul 02 '20 at 16:02

1 Answers1

0

You can use Select-String with the -Context parameter for this.

$out = Select-String -Path test.txt -Pattern 'Fixed word' -SimpleMatch -Context 0,1
$out.Context.PostContext

-Context 0,1 outputs 0 lines above the match and 1 line below the match. Lines above the match are in the PreContext property, and lines below are stored in the PostContext property.

Adding -SimpleMatch performs a verbatim string match rather than the default regex match.

-Path supports an array and/or wildcards. So you can point to a path of text files and have all of them be read. Each line of a file is read at a time similar to Get-Content file.txt.

AdminOfThings
  • 23,946
  • 4
  • 17
  • 27
  • Thanks. this works great. However i found one exception. I updated my initial post with this scenario. Maybe you can have a look? – Barry Badpak Jul 02 '20 at 16:25
  • @BarryBadpak is the new scenario that you have two different fixed words in the same sentence and you want to match both? Then return the sentence under that? – AdminOfThings Jul 02 '20 at 19:03