4

I use a software called Belarc Avisor, that provides an html output of all the hardware-software details including licenses/keys/serials of installed software in an html format. I usually create this report from this software either on a new PC or before formatting a PC. However the chrome exported file uses a separate folder for images, and I need a standalone html file with all the details and images (inclding css styles of the html report).

I presently had to replace the images in notepad++ with their base64 code that was generated from an online website. I'm looking for an alternative way to do this in either batch script or Powershell. I found two stackoverflow questions {q1}, {q2}, and a {blog-post}, and have the following code:

    $original_file = 'path\filename.html'
    $destination_file =  'path\filename.new.html'
    (Get-Content $original_file) | Foreach-IMG-SELECTOR-Object {
        $path = $_ SOURCE-TAG-SELECTOR `
        -replace $path, [convert]::ToBase64String((get-content $path -encoding byte))
    } | Set-Content $destination_file

In the Foreach-Object, maybe the object could be selected by the html img tag? If yes, then the base64 conversion would be quite easy!

for converting to base64, the string is: [convert]::ToBase64String((get-content $path -encoding byte))

where $path is the path to the image. Could be just copied from the <img src=""> tag.

I just read that Windows 10 has Powershell 5.0, so I thought I could create a batch file for creating this.

So if the img tags & the src attribute can be selected, they only have to be replaced by their base64 tags.

Modified version of the Answer

The answer provided by Alexendar is invalid because during the loop, the attribute-Value is being set to the #Document, while it should be set to the current node. After searching online and reading the Powershell console, I found that this could be solved by Selecting the current node via XPath. Here's the modified answer:

Import-Module -Name "C:\HtmlAgilityPack.1.4.6\Net40\HtmlAgilityPack.dll" # Change to your actual path

function Convert_to_Base64 ($sImgFile)
{
#$sImgFile = "C:\image.jpg" # Change to your actual path
$oImgFormat = [System.Drawing.Imaging.ImageFormat]::Gif # Change to your format

$oImage = [System.Drawing.Image]::FromFile($sImgFile)
$oMemoryStream = New-Object -TypeName System.IO.MemoryStream
$oImage.Save($oMemoryStream, $oImgFormat)
$cImgBytes = [Byte[]]($oMemoryStream.ToArray())
$sBase64 = [System.Convert]::ToBase64String($cImgBytes)

$sBase64
}


$sInFile = "C:\Users\USER\Desktop\BelarcAdvisor win10\Belarc Advisor Computer Profile.html" # Change to your actual path
$sOutFile = "D:\Win10-Belarc.html" # Change to your actual path
$sPathBase = "C:\Users\USER\Desktop\BelarcAdvisor win10\"

$sXpath = "//img"
$sAttributeName = "src"

$oHtmlDocument = New-Object -TypeName HtmlAgilityPack.HtmlDocument
$oHtmlDocument.Load($sInFile)
$oHtmlDocument.DocumentNode.SelectNodes($sXpath) | ForEach-Object {
    # If you need to download the image, here's how you can extract the image
    # URI (note that it may be realtive, not absolute):

    $sVarXPath = $_ #To get the Current Node and then later get Attributes + XPathXPath from this node variable.

    #$sVarXPath.XPath

    $sSrcPath = $sVarXPath.get_Attributes() `
        | Where-Object { $_.Name -eq $sAttributeName } `
        | Select-Object -ExpandProperty "Value"
    # Assembling absolute URI:
    $sUri = Join-Path -Path $sPathBase -ChildPath $sSrcPath.substring(2) #substring for "./" in the src string of the img in subfolder.
    #$sUri
    # Now you can d/l the image: Invoke-WebRequest -Uri $sUri
    #[System.Drawing.Image]::FromFile($sUri)

    # Put your Base64 conversion code here.
    $sBase64 = Convert_to_Base64($sUri)

    $sSrcValue = "data:image/png;base64," + $sBase64
    $oHtmlDocument.DocumentNode.SelectNodes($sVarXPath.XPath).SetAttributeValue($sAttributeName, $sSrcValue)
    #$oHtmlDocument.DocumentNode.SelectNodes($sVarXPath.XPath).GetAttributeValue($sAttributeName, "")
}

#$oHtmlDocument.DocumentNode.SelectNodes($sXpath) | foreach-object { write-output $_ }

$oHtmlDocument.Save($sOutFile)
Community
  • 1
  • 1
mk117
  • 753
  • 2
  • 13
  • 26

2 Answers2

5

It's quite easy. You could use HtmlAgilityPack to parse HTML:

Import-Module -Name "C:\HtmlAgilityPack.dll" # Change to your actual path

$sInFile = "E:\Temp\test.html" # Change to your actual path
$sOutFile = "E:\temp\test1.html" # Change to your actual path
$sUriBase = "http://example.com/" # Change to your actual URI base

$sXpath = "//img"
$sAttributeName = "src"

$oHtmlDocument = New-Object -TypeName HtmlAgilityPack.HtmlDocument
$oHtmlDocument.Load($sInFile)
$oHtmlDocument.DocumentNode.SelectNodes($sXpath) | ForEach-Object {
    # If you need to download the image, here's how you can extract the image
    # URI (note that it may be realtive, not absolute):
    $sSrcPath = $_.get_Attributes() `
        | Where-Object { $_.Name -eq $sAttributeName } `
        | Select-Object -ExpandProperty "Value"
    # Assembling absolute URI:
    $sUri = $sUriBase + $sSrcPath
    # Now you can d/l the image: Invoke-WebRequest -Uri $sUri


    # Put your Base64 conversion code here.
    $sBase64 = ...

    $sSrcValue = "data:image/png;base64," + $sBase64
    $_.SetAttributeValue($sAttributeName, $sSrcValue)
}

$oHtmlDocument.Save($sOutFile)

Converting image file to Base64 string:

$sImgFile = "C:\image.jpg" # Change to your actual path
$oImgFormat = [System.Drawing.Imaging.ImageFormat]::Jpeg # Change to your format

$oImage = [System.Drawing.Image]::FromFile($sImgFile)
$oMemoryStream = New-Object -TypeName System.IO.MemoryStream
$oImage.Save($oMemoryStream, $oImgFormat)
$cImgBytes = [Byte[]]($oMemoryStream.ToArray())
$sBase64 = [System.Convert]::ToBase64String($cImgBytes)
Alexander Obersht
  • 3,215
  • 2
  • 22
  • 26
  • Thanks! I downloaded the HTMLAgilityPack v.1.4.6... There are about 9 folders in the zip file. Which dll file should I use for my system? I'm on Windows 10 Home Edition 64bit. Folders included in the zip file are: `Net20 / Net40 / Net40-client / Net45 / sl3-wp / sl4 / sl4-windowsphone71 / sl5 /winrt45` – mk117 Aug 09 '15 at 10:16
  • 1
    Depending on which versions of .NET you have on your PC. If you have .NET 4.5 installed, use the dell from `Net45` folder. – Alexander Obersht Aug 09 '15 at 10:19
  • Also, what should the uri base be for `` ... The image folder is in the same folder that the html file is in, and src has a `./` for the folder path. – mk117 Aug 09 '15 at 10:20
  • 1
    If you have images already downloaded, use `[System.Drawing.Image]::FromFile()` method to load them. Relative path should be enough. – Alexander Obersht Aug 09 '15 at 10:28
  • Should I use `file:///C:/Temp/` as the `$sUriBase`? Or `C:\Temp\ ` – mk117 Aug 09 '15 at 10:58
  • If you want to use absolute paths, you can write something like `$sPathBase = "C:\Temp\"` and then `Join-Path -Path $sPathBase -ChildPath $sSrcPath`. It will convert slashes to backslashes. – Alexander Obersht Aug 09 '15 at 11:03
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/86545/discussion-between-mk117-and-alexander-obersht). – mk117 Aug 09 '15 at 11:47
  • Hi! the script ran without errors, although the output file's img src tags were not changed to base64... the complete html file output was unchaged.. I ran the script twice, yet no change in the html file... *html file output was unchanged – mk117 Aug 09 '15 at 12:06
  • Here's the edited version from your script that I'm trying to run [Script@pastebin](http://pastebin.com/tvYVNSh3) – mk117 Aug 09 '15 at 12:14
  • Well, I tried adding code to the `ForEach-Object` loop (after `SetAttributeValue` line): `$oHtmlDocument.DocumentNode.GetAttributeValue($sAttributeName, "")` and the base64 code is present in each run of the img tag loop, although when I added the following code after the loop and before the save(), then the base64 is not present: `$oHtmlDocument.DocumentNode.SelectNodes($sXpath) | foreach-object { write-output $_ }`, seems like the values are only present inside the `foreach` loop and the loop isn't transferring the base64 values to the respective image tags at all! – mk117 Aug 10 '15 at 12:15
  • Please do correct your answer... The SetAttributeValue was setting the attribute for root #document, and not the active node. I have edited my question with the modified version of your answer... – mk117 Aug 11 '15 at 09:06
  • 1
    Sorry for late reply. I meant to test my code but forgot. ) Corrected it and now it should be OK. – Alexander Obersht Aug 11 '15 at 16:10
  • Thanks again! I didn't know I could just use `$_` for the active node. I was using a variable for that! – mk117 Aug 11 '15 at 16:13
2

Partial answer here, but I was able to perform the conversion from a single image file to an output text file with data URI encoding in just one line:

"data:image/png;base64," + [convert]::tobase64string([io.file]::readallbytes(($pwd).path + "\\image.png")) | set-content -encoding ascii "image.txt"

(Note that the output file encoding seems to make a difference.)

Mainly posting this because this is what came up in my web search, and it also simplifies the conversion in Alexander's answer.

Andrew
  • 5,839
  • 1
  • 51
  • 72