1

I have a CSV StackOverflow-style coded URLs, and need to convert them to HTML with Powershell and save them back in the third column of the CSV.

CSV Sample:

ID,Comment,CommentConverted
1,Check [this out](https://stackoverflow.com),
2,To buy coffee [click here](https://google.com) or [here](https://bing.com),

What I need

cID,Comment,CommentConverted
1,Check [this out](https://stackoverflow.com),Check <a href="https://stackoverflow.com">this out</a>    
2,To buy coffee [click here](https://google.com) or [here](https://bing.com),To buy coffee <a href="https://google.com">Click Here</a> or <a href="https://bing.com">here</a>

I'm having problems parsing this where there are multiple URLs in the Comment field.

My (ugly, highly-verbose) Powershell code:

$comment_list = Import-Csv "c:\temp\Comments.csv"
$comment_list.foreach(
{
  $x = $_.Comment
  $linktextOUT = $x.Substring($x.IndexOf('[')+1,$x.IndexOf(']')-$x.IndexOf('[')-1)
  $linktextREPLACE = "[" + $linktextOUT + "]" 
  $URLOUT = $x.Substring($x.IndexOf('(')+1,$x.IndexOf(')')-$x.IndexOf('(')-1)
  $URLREPLACE = "(" + $URLOUT + ")" 
  $output = $x.Replace($URLREPLACE, "")
  $output = $output.Replace($linktextREPLACE, "")
  $output = $output + "<a href=""" + $URL + """>" + $linktext + "</a>"
  Write-Host $_.cID","$_.Comment","$output
})

What it outputs:

1 , Check [this out](https://stackoverflow.com) , Check <a href="https://stackoverflow.com">this out</a>    
2 , To buy coffee [click here](https://google.com) or [here](https://bing.com) , To buy coffee  or [here](https://bing.com)<a href="https://google.com">click here</a>

You can see that the first line is just fine - the Powershell outputs the correct third column. However, it can't see the second URL in line 2, so it completely skips the "bing.com" URL. Note that some lines may have up to 5 URLs in the comments.

Any help is appreciated in advance.

Beems
  • 801
  • 2
  • 13
  • 33
  • 1
    Switch to _PowerShell 6+_ and apply [ConvertFrom-Markdown](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/convertfrom-markdown) cmdlet… – JosefZ Aug 22 '21 at 20:05
  • 1
    @JosefZ, agreed, that is the path of least resistance; ***but what if the OP is in a position where that is not an option.*** Many organizations/enterprises are not even on PSv4 or 5 to date. So, other options for legacy versions are prudent. Especially since it appears that is what the is on at this time, and we may assume they cannot move to PSCore at all. – postanote Aug 22 '21 at 20:37
  • @JosefZ Put it as an answer and I'll test and accept - that seems like the best solution. I tried to just run `ConvertFrom-Markdown` piped via out-file, but that didn't work. – Beems Aug 26 '21 at 02:12
  • 1
    @Beems, please see my updated answer, which shows a `ConvertFrom-Markdown` solution, along with an explanation of why its use is probably overkill for your use case. – mklement0 Aug 29 '21 at 21:49

3 Answers3

4

Use of the -replace operator enables a concise solution:

@'
ID,Comment,CommentConverted
1,Check [this out](https://stackoverflow.com),
2,To buy coffee [click here](https://google.com) or [here](https://bing.com),
'@ > Comments.csv

Import-Csv Comments.csv | ForEach-Object {
  $_.CommentConverted =
    $_.Comment -replace '\[(.+?)\]\((.+?)\)', '<a href="$2">$1</a>'
  $_ # output the modified object
}

As JosefZ points out, PowerShell (Core) 7+ has a ConvertFrom-Markdown utility that converts Markdown input to both HTML text (property .Html on the output objects) and text with embedded VT (Virtual Terminal) escape sequences (property .VT100EncodedString), as well as an AST (abstract syntax tree) representation of the Markdown input text.

In the case at hand, however, its benefits are limited, because the input string is invariably wrapped in a <p> HTML element, which necessitates removing that wrapper to get the desired result; thus, in this particular instance, the above, regex-based solution is not only more concise, but performs better.

For the sake of completeness, here's a solution based on ConvertFrom-Markdown:

@'
ID,Comment,CommentConverted
1,Check [this out](https://stackoverflow.com),
2,To buy coffee [click here](https://google.com) or [here](https://bing.com),
'@ > Comments.csv

Import-Csv Comments.csv | ForEach-Object {
  $_.CommentConverted = ([xml] ($_.Comment | ConvertFrom-Markdown).Html).p.InnerXml
  $_
}
mklement0
  • 382,024
  • 64
  • 607
  • 775
2

How about this?

$csv = @'
ID,Comment,CommentConverted
1,Check [this out](https://stackoverflow.com),
2,To buy coffee [click here](https://google.com) or [here](https://bing.com),
'@

$data = $csv | ConvertFrom-Csv

$pattern = '\[(.+?)\]\s?\((.+?)\)'
$data | ForEach-Object {
    $object = $_
    $object.Comment | Select-String -Pattern $pattern -AllMatches | ForEach-Object {
        $line = $_.Line
        $_.Matches | ForEach-Object {

            $new = "<a href=""$($_.Groups[2].Value)"">$($_.Groups[1].Value)</a>"
            $line = $line -replace [regex]::Escape($_.Groups[0].Value), $new
        }
        $object.CommentConverted = $line
        $object
    }
}

Output

ID               : 1
Comment          : Check [this out](https://stackoverflow.com)
CommentConverted : Check <a href="https://stackoverflow.com">this out</a>

ID               : 2
Comment          : To buy coffee [click here](https://google.com) or [here](https://bing.com)
CommentConverted : To buy coffee <a href="https://google.com">click here</a> or <a href="https://bing.com">here</a>
Daniel
  • 4,792
  • 2
  • 7
  • 20
0

Is this what you are after?

Import-Csv -Path '.\comments.csv'
# Results
<#
ID Comment                                                                    CommentConverted
-- -------                                                                    ----------------
1  Check [this out](https://stackoverflow.com)                                                
2  To buy coffee [click here](https://google.com) or [here](https://bing.com)  
#>


Import-Csv -Path '.\comments.csv' | 
ForEach-Object {
    [PSCustomObject]@{
        ID               = $PSItem.ID
        Comment          = $PSItem.Comment
        CommentConverted = ((($PSItem.Comment -replace '\(', '<a href=') -replace '\)', '>Click Here</a>') -replace '\[here\]') -replace '\[click here\]|\[|\]'
    }
}
# Results
<#
ID Comment                                                                    CommentConverted                                                                                  
-- -------                                                                    ----------------                                                                                  
1  Check [this out](https://stackoverflow.com)                                Check this out<a href=https://stackoverflow.com>Click Here</a>                                    
2  To buy coffee [click here](https://google.com) or [here](https://bing.com) To buy coffee <a href=https://google.com>Click Here</a> or <a href=https://bing.com>Click Here</a>
#>
postanote
  • 15,138
  • 2
  • 14
  • 25