1

There's a HTML file which is exported to a variable using 'Invoke-WebRequest' from where I'd like to export content from a specific table.

$Result = Invoke-WebRequest 'https://www.dailyfaceoff.com/teams/anaheim-ducks/line-combinations'

Unfortunately by using $result.parsedHTML is not returning any result. Hence I was looking at using regex to find the string. This is where I am looking for your help.

Requested actions:

  • search within HTML file for table with id=LW1
  • within this cell search for <span class=name>Hello World</span>
  • export content 'Hello World'

HTML Structure:

<body ...>
    <div ...>
        <tbody>
            <td id="LW1">
                <a ....>
                    <span class="player-name">Hello World</span>
                </a>
            </td>
        </tbody>
    </div>
</body>

Thanks in advance for any input or help!

Try 1:

$r = Invoke-WebRequest 'https://www.dailyfaceoff.com/teams/anaheim-ducks/line-combinations'
$table = $r.ParsedHtml.getElementsByTagName("table")

Result 1: No output, looks like HTML structure is preventing parsing action.

Try 2:

$r = Invoke-WebRequest 'https://www.dailyfaceoff.com/teams/anaheim-ducks/line-combinations'
$string = ($r.Content | 
    where {$_ -match '^a href.*LW1.*\ title=.*>/span.*'}) -replace '.*>'

Result 2: Regex not matching

ronweis
  • 13
  • 2
  • Check out [this answer](https://stackoverflow.com/a/68728114/3245749) from another question. I think their method of using the Internet Explorer COMObject will serve you well. – TheMadTechnician Nov 03 '22 at 21:04

1 Answers1

1

Please don't try to parse HTML with regex, that's a terrible idea. You can do this in both, PowerShell Core and Windows PowerShell using Com Object:

$com = New-Object -ComObject htmlfile
$com.write([System.Text.Encoding]::Unicode.GetBytes(@'
<body>
  <div>
    <tbody>
      <td id="LW1">
        <a><span class="player-name">Hello World</span></a>
      </td>
    </tbody>
  </div>
</body>
'@))

$com.getElementsByClassName('player-name') | ForEach-Object innerHtml
# Outputs: Hello World
$null = [System.Runtime.InteropServices.Marshal]::ReleaseComObject($com)
Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
  • 1
    Hi Santiago, Many thanks for your valuable input, that's exactly what I need. Btw, as there are multiple cells with the same class name I had to use an additional object to search first for the cell. – ronweis Nov 06 '22 at 13:15