1

I am new to Powershell, and I suck at html.

There's a page with a table, and each cell has a ahref link, the value of the link is dynamic, but the link which I want to automate-clicking is always in the first cell.

I know there's cellindex in html/JS, is it usable in PS?

For example, let's say I have this table on a website.

<table>

 <tr>
   <td>
  <a href="http://example1.com">
    <div style="height:100%;width:100%">
      hello world1
    </div>
  </a>
</td>

 </tr>

 <tr>
   <td>
  <a href="http://example2.com">
    <div style="height:100%;width:100%">
      hello world2
    </div>
  </a>
</td>
</tr>

 <tr>
   <td>
  <a href="http://example3.com">
    <div style="height:100%;width:100%">
      hello world3
    </div>
  </a>
</td>
</tr>

</table>

And I want to make powershell to always click on the first link, the link inside is dynamic though.

Any ideas? Hints?

2 Answers2

2

The result of Invoke-WebRequest returns a property named Links that is a collection of all the hyperlinks on a web page.

For example:

$Web = Invoke-webrequest -Uri 'http://wragg.io' $Web.Links | Select innertext,href

Returns:

innerText                    href
---------                    ----
Mark Wragg                   http://wragg.io
 Twitter                     https://twitter.com/markwragg
 Github                      https://github.com/markwragg 
 LinkedIn                    https://uk.linkedin.com/in/mwragg

If the link you want to capture is always the first in this list you could get it by doing:

$Web.Links[0].href

If it's the second [1], third [2] etc. etc.

I don't think there is an equivalent of "cellindex", although there is a property named AllElements that you can access via an array index. E.g if you wanted the second element on the page you could for example do:

$Web.AllElements[2]

If you need to get to a specific table in the page and then access links inside of that table you'd probably need to iterate through the AllElements property until you reached the table you wanted. For example if you know the links were in the third table on the page:

$Links = @()
$TableCount = 0

$Web.AllElements | ForEach-Object {

    If ($_.tagname -eq 'table'){ $TableCount++ }

    If ($TableCount -eq 3){

        If ($_.tagname -eq 'a') {
            $Links += $_
        }
    }
}

$Links | Select -First 1
Mark Wragg
  • 22,105
  • 7
  • 39
  • 68
  • Can I use cellindex? What if there are other tags 'a' above the one I want to select? –  Mar 28 '17 at 22:46
  • I have overhauled my answer as I forgot about the "Links" property that probably is the simplest solution. I don't think there is an equivalent to cellindex, unless maybe using the AllElements property with an index is similar to what you mean. Each item in `AllElements` has an `innerhtml` and `outerhtml` property so I think you can get the tags around the 'a' tag by accessing `outerhtml`. Hope this helps. – Mark Wragg Mar 29 '17 at 08:06
  • Thank you Mark, one another general question: is there a third party tool or a PS method to know what's the index number of a specific htlm element? –  Mar 29 '17 at 09:51
  • This would return the name and index number of each tagname in AllElements: `$Web.AllElements.Tagname | ForEach -Begin {$i=0} -Process {"Value:$_ Index:$i"; $i++}` – Mark Wragg Mar 29 '17 at 10:07
  • Your example is returning me error: PS C:\Users\Samer> $Web = Invoke-webrequest -Uri 'http://wragg.io' $Web.Links | Select innertext,href Invoke-WebRequest : A positional parameter cannot be found that accepts argument '$null'. At line:1 char:8 + $Web = Invoke-webrequest -Uri 'http://wragg.io' $Web.Links | Select i ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : InvalidArgument: (:) [Invoke-WebRequest], ParameterBindingException + FullyQualifiedErrorId : PositionalParameterNotFound,Microsoft.PowerShell.Commands.InvokeWebRequestCommand –  Mar 29 '17 at 11:52
  • `$Web = Invoke-webrequest -Uri 'wragg.io'` should be `$Web = Invoke-webrequest -Uri 'http://wragg.io'` – Mark Wragg Mar 29 '17 at 11:54
  • The comment ate the http part: check the screenshot: http://i.imgur.com/GiI46Ac.jpg –  Mar 29 '17 at 11:57
  • In your screenshot you are missing a semi-colon before $web.links. – Mark Wragg Mar 29 '17 at 11:59
  • Your link is working, but the link I am trying is causing a Store popup "You'll need a new app to open this" - ever happened with you? –  Mar 29 '17 at 12:21
  • I haven't personally deal with that in PowerShell, but you could try adding `-usebasicparsing` as a switch to Invoke-Webrequest as that bypasses Internet Explorer. You still get a `.Links` property returned but you lose AllElements and ParsedHTML as there is no DOM parsing performed. – Mark Wragg Mar 29 '17 at 13:20
  • Well, everything is working with your link, but not with mine; something is making it difficult in the page I am trying to automate. –  Mar 29 '17 at 16:36
0

Ok, the Invoke-webrequest method is working with mark's link but with my page; but I noticed a pattern that may can be used:

I noticed the the following:

<table id="row" class="simple">
<thead>
<tr>
<th></th>
<th class="centerjustify">File Name</th>
<th class="centerjustify">File ID</th>
<th class="datetime">Creation Date</th>
<th class="datetime">Upload Date</th>
<th class="centerjustify">Processing Status</th>
<th class="centerjustify">Exceptions</th>
<th class="centerjustify">Unprocessed Count</th>
<th class="centerjustify">Discarded Count</th>
<th class="centerjustify">Rejected Count</th>
<th class="centerjustify">Void Count</th>
<th class="centerjustify">PO Total Count</th>
<th class="centerjustify">PO Total Amount</th>
<th class="centerjustify">CM Total Count</th>
<th class="centerjustify">CM Total Amount</th>
<th class="centerjustify">PO Processed Count</th>
<th class="centerjustify">PO Processed Amount</th>
<th class="centerjustify">CM Processed Count</th>
<th class="centerjustify">CM Processed Amount</th>
<th class="centerjustify">Counts At Upload</th></tr></thead>
<tbody>
<tr class="odd">
<td><input type="radio" disabled="disabled" name="checkedValue" value="12047" /></td>
<td class="leftjustify textColorBlack">
<a href="loadConfirmationDetails.htm?fId=12047">520170123000000_520170123000000_20170327_01.txt</a></td>
<td class="centerjustify textColorBlack">1</td>
<td class="datetime textColorBlack">Mar 27, 2017 0:00</td>
<td class="datetime textColorBlack">Mar 27, 2017 10:33:24 PM +03:00</td>
<td class="centerjustify textColorBlack">

The fId part in "loadConfirmationDetails.htm?fId=12047" is dynamic; and it's the last part of the next page;

For example: "https://aaa.xxxxxxx.com/aaa/community/loadConfirmationDetails.htm?fId=12047

And table's ID is unique, called "row" - I wonder if I can use a completely another way; other than invoking the webpage, by auto-copying this id info from its source html and concatenate it with the main link?

I am really out of ideas beyond that.