2

I'm writing a PowerShell script to convert a collection of multi-worksheet Excel xlsx workbooks into a single csv file. One of the things I want to grab is the calculated text of a hyperlink created by a HYPERLINK formula. For example a cell contains =HYPERLINK(CONCATENATE("http://foo/bar.aspx?pid=",A2),"Click Here")

I'm able to grab the cell with $currentCell = $sheet.Cells.Item($r, $c). I can grab the link text Click Here using $currentCell.Text I can detect the cell has a formula by testing $currentCell.HasFormula. I can grab the formula using $currentCell.Formula and parse it with a regex to detect it contains a HYPERLINK formula. But, what I want is to obtain the results of executing the formula. I can execute the formula using $currentCell.Calculate(), but I can't figure out how to get a hold of the results (When I assign the results of $currentCell.Calculate() to a variable, the variable ends up being a System.DBNull).

How do I programmatically obtain the results of an cell's Calculate method?

Update

After thinking about Benoît Mayer's answer, I realized that I didn't understand the basis of my own question. I was trying to generalize the processing of cells that contain formulas, but that won't work. The cell's formula IS being calculated, i.e, as I extract the cell's text (the cell with the HYPERLINK and CONCATENATE formulas), I'm getting Click Here which is the result of executing the formula (e.g.=HYPERLINK(CONCATENATE("http://foo/bar.aspx?pid=",A2),"Click Here")). I need to detect and parse the HYPERLINK and CONCATENATE formulas and use the approach that Benoît describes.

Here's my code. It converts multiple Excel workbooks, each with multiple sheets, and extracts the results of the specific formulas in the sheets I need to process. See the code following lines 136 and 145.

**Code. Updated on 5/7 with bug fixes and code to detect and extract data from specific formulas **

cls

#region Functions

Function Remove-WhiteSpaceFromNonQuoted($inString)
{

    $quoted = $false
    $newString = ""

    for ($i = 0; $i -lt $inString.Length; $i++)
    {
        if ($inString[$i] -eq "`"")
        {
            $quoted = $quoted -xor $true
        }

        if (($inString[$i] -match "\S" -and !$quoted) -or ($quoted))
        {
            $newString = $newString + $inString[$i] 
        }
    }

    return $newString
}

#endregion

$sortedFieldNameList = New-Object -TypeName System.Collections.SortedList

$fqBookNames = New-Object -TypeName System.Collections.SortedList

$fqBookNames.Add("C:\foo\bar1.xlsx", "")
$fqBookNames.Add("C:\foo\bar2.xlsx", "")
$fqBookNames.Add("C:\foo\barN.xlsx", "")

$global:workBook = $null
$global:excel =  $null

try
{   

    $global:excel = New-Object -Com Excel.Application
    $global:excel.Visible = $false

    write-host ("Scan for column names")

    #Scan all sheets in all books and create an object with all the column names encountered 
    foreach ($fqBookName in $fqBookNames.Keys)
    {
        $global:workBook = $global:excel.Workbooks.Open($fqBookName)

        foreach ($sheet in $global:workBook.Sheets)
        {
            $columnIndexMax = $sheet.UsedRange.Column + $sheet.UsedRange.Columns.Count - 1
            write-host ("Workbook=" + $global:workBook.Name + ". Sheet=" + $sheet.Name)
            $rowOne = $sheet.Rows(1)

            for ($columnIndex = 1; $columnIndex -le $columnIndexMax; $columnIndex++)
            {
                $columnName = $rowOne.Cells($columnIndex).Text.Trim().ToUpper()

                if ($columnName.Length -gt 0)
                {
                    if (!$sortedFieldNameList.ContainsKey($columnName)) 
                    {
                        $sortedFieldNameList.Add($columnName, "")
                    }
                }
                else
                {
                    break
                }
            }
        }

        $global:workBook.Close($false)
        Clear-Variable workBook
    }

    #Create a class that represents the worst-case collection of columns that will be output to, e.g., a grid or CSV file
    #https://stackoverflow.com/questions/49117127/create-a-class-with-dynamic-property-names-in-powershell
    Invoke-Expression @"
    Class ClsExportCsv {
    $(($sortedFieldNameList.Keys).ForEach({"[string] `${$($_)}`n "}))
    }
"@

    #create array to hold list of rows that will be output to, e.g., a grid or CSV file
    $itemList = New-Object System.Collections.ArrayList
    $itemList.clear()

    write-host ("Scan for data")

    foreach ($fqBookName in $fqBookNames.Keys)
    {
        $global:workBook = $global:excel.Workbooks.Open($fqBookName)

        foreach ($sheet in $global:workBook.Sheets)
        {
            write-host -NoNewline ("Workbook=" + $global:workBook.Name + ". Sheet=" + $sheet.Name + ". Rows=")

            $columnNameLookup = @{}
            $columnNameLookup.Clear()

            $columnIndexMax = $sheet.UsedRange.Column + $sheet.UsedRange.Columns.Count - 1
            $rowOne = $sheet.Rows(1)

            #create column name index lookup table for this sheet
            for ($columnIndex = 1; $columnIndex -le $columnIndexMax; $columnIndex++)
            {
                $columnNameLookup.Add($columnIndex, $rowOne.Cells($columnIndex).Text.Trim().ToUpper())
            }

            for ($rowIndex = 2; $rowIndex -le $sheet.Cells.EntireRow.Count; $rowIndex++)
            {
                $rowCurrent = $sheet.Rows($rowIndex)

                if (($rowCurrent.Cells(1).Text).Length -gt 0)
                {

                    $listRow = New-Object -TypeName ClsExportCsv

                    for ($columnIndex = 1; $columnIndex -le $columnIndexMax; $columnIndex++)
                    {
                        if (($columnNameLookup.$columnIndex).Length -gt 0)
                        {

                            $cellObject = $rowCurrent.Cells($columnIndex)
                            $textFromFormula = ""

                            if ($cellObject.HasFormula)
                            {
                                $formulaNoWhiteSpace = Remove-WhiteSpaceFromNonQuoted -inString $cellObject.Formula

                                #detect and parse cells with =HYPERLINK(CONCATENATE("http://xxxx.aspx?pid=",A2),"Click Here")
                                if ($formulaNoWhiteSpace -match '^(?:\=HYPERLINK\(CONCATENATE\(\")(?<URL>.*)(?:\"\,)(?<A1>.*)(?:\)\,.*)$')
                                {
                                    if (($Matches["URL"] -ne $null) -and ($Matches["A1"] -ne $null))
                                    {
                                        $textFromFormula = ($Matches["URL"] + $sheet.Range($Matches["A1"]).Text) 
                                    }
                                }

                                #detect and parse cells with =HYPERLINK("http://xxxx","Click Here")
                                if ($formulaNoWhiteSpace -match '^(?:\=HYPERLINK\(\")(?<URL>.*)(?:\"\,\".*\"\))$')
                                {
                                    if ($Matches["URL"] -ne $null)
                                    {
                                        $textFromFormula = $Matches["URL"] 
                                    }
                                }
                            }

                            if ($textFromFormula.Length -eq 0)
                            {
                                $listRow.($columnNameLookup.$columnIndex) = $rowCurrent.Cells($columnIndex).Text.Trim()
                            }
                            else
                            {
                                $listRow.($columnNameLookup.$columnIndex) = $textFromFormula
                            }

                        } # if (($columnNameLookup.$columnIndex).Length -gt 0)

                    } # for ($columnIndex = 1; ...

                    $itemList.Add($listRow) | out-null
                }
                else
                {
                    write-host ($rowIndex - 2).ToString()
                    break
                }

            } # for ($rowIndex = 2; .....

        } # foreach ($sheet in $global:workBook.Sheets)

        $global:workBook.Close($false)
        Clear-Variable workBook
    }

    $global:excel.Quit()
    Clear-Variable excel

    $itemList | Export-CSV -LiteralPath "C:\Users\foo\combined.csv" -NoTypeInformation -Encoding UTF8 -Delimiter ',' $itemList | Out-GridView -Title "Rows"

}
finally
{

    if ($global:excel -ne $null)
    {
        if ($global:workBook -ne $null)
        {
            $global:workBook.Close($false)
        }

        $global:excel.Quit()
        Clear-Variable excel
    }
}
General Grievance
  • 4,555
  • 31
  • 31
  • 45
VA systems engineer
  • 2,856
  • 2
  • 14
  • 38
  • Why did you accept an answer if you did not have a solution? If that answer does not work, you should un-accept the answer. It looks like you're trying to get the `value` of a formula's output. Is that correct? You're running `.Calculate()` on the line you're having issues with? – gravity May 07 '19 at 17:16
  • @grav - I just posted my solution. See my notes - I misunderstood what I was trying to accomplish. Reading over Benoit's answer and the other SO answer he pointed to made me realize what I had to to. Please remove your downvote. My solution works for my needs. I'm extacting the URLs I need from the cells containing the HYPERLINK formula. Asdditionally, this code is good for general-purpose XLSX-CSV conversion – VA systems engineer May 07 '19 at 17:21
  • I did not down-vote your question. The answer no longer really applies to the question or issue, so I would not disagree with someone choosing to do so either. – gravity May 07 '19 at 17:53

1 Answers1

1

It does not seem possible to directly get the adress generated by the Concatenate function, see for example Extract URL From Excel Hyperlink Formula.

Why a solution using regex such as the one below would not be appropriate?

$split = $currentCell.Formula -split 'CONCATENATE' | Select -Last 1 | %{$_ -replace `
'[" ()]','' -split ','}
$calculatedResult = $split[0] + $sheet.Range("$($split[1])").Text
BeMayer
  • 380
  • 1
  • 9
  • I see now that I didn't understand the basis of my own question. I was trying to generalize the processing of cells that contain formulas, but that won't work. The cell's formula IS being calculated, i.e, as I extract the cell's text, I'm getting `Click Here` which is the result of executing the formula. I will need to detect and parse the HYPERLINK and CONCATENATE formulas and use the approach that you and [Extract URL From Excel Hyperlink Formula](https://stackoverflow.com/questions/32230657/extract-url-from-excel-hyperlink-formula) describe. – VA systems engineer May 07 '19 at 11:34