4

I am traversing a webpage and I need the value from a specific 'td' tag:

$ie.Document.getElementsByTagName("td")

And I get this:

className                    :
id                           :
tagName                      : TD
parentElement                : System.__ComObject
style                        : System.__ComObject
onhelp                       :
onclick                      :
ondblclick                   :
onkeydown                    :
onkeyup                      :
onkeypress                   :
onmouseout                   :
onmouseover                  :
onmousemove                  :
onmousedown                  :
onmouseup                    :
document                     : mshtml.HTMLDocumentClass
title                        :
language                     :
onselectstart                :
sourceIndex                  : 121
recordNumber                 :
lang                         :
offsetLeft                   : 0
offsetTop                    : 32
offsetWidth                  : 42
offsetHeight                 : 32
offsetParent                 : System.__ComObject
innerHTML                    : <span title="" id="jobsForm:jobsTable:0:jobId">886</span>
innerText                    : 886
outerHTML                    : <td><span title="" id="jobsForm:jobsTable:0:jobId">886</span></td>
outerText                    : 886
parentTextEdit               : System.__ComObject
isTextEdit                   : False
filters                      :
ondragstart                  :
onbeforeupdate               :
onafterupdate                :
onerrorupdate                :
onrowexit                    :
onrowenter                   :
ondatasetchanged             :
ondataavailable              :
ondatasetcomplete            :
onfilterchange               :
children                     : System.__ComObject
all                          : System.__ComObject
scopeName                    : HTML
onlosecapture                :
onscroll                     :
ondrag                       :
ondragend                    :
ondragenter                  :
ondragover                   :
ondragleave                  :
ondrop                       :
onbeforecut                  :
oncut                        :
onbeforecopy                 :
oncopy                       :
onbeforepaste                :
onpaste                      :
currentStyle                 : System.__ComObject
onpropertychange             :
tabIndex                     : 0
accessKey                    :
onblur                       :
onfocus                      :
onresize                     :
clientHeight                 : 31
clientWidth                  : 42
clientTop                    : 1
clientLeft                   : 0
readyState                   : complete
onreadystatechange           :
onrowsdelete                 :
onrowsinserted               :
oncellchange                 :
dir                          :
scrollHeight                 : 31
scrollWidth                  : 42
scrollTop                    : 0
scrollLeft                   : 0
oncontextmenu                :
canHaveChildren              : True
runtimeStyle                 : System.__ComObject
behaviorUrns                 : System.__ComObject
tagUrn                       :
onbeforeeditfocus            :
isMultiLine                  : True
canHaveHTML                  : True
onlayoutcomplete             :
onpage                       :
onbeforedeactivate           :
contentEditable              : inherit
isContentEditable            : False
hideFocus                    : False
disabled                     : False
isDisabled                   : False
onmove                       :
oncontrolselect              :
onresizestart                :
onresizeend                  :
onmovestart                  :
onmoveend                    :
onmouseenter                 :
onmouseleave                 :
onactivate                   :
ondeactivate                 :
onmousewheel                 :
onbeforeactivate             :
onfocusin                    :
onfocusout                   :
uniqueNumber                 : 13
uniqueID                     : ms__id13
nodeType                     : 1
parentNode                   : System.__ComObject
childNodes                   : System.__ComObject
attributes                   : System.__ComObject
nodeName                     : TD
nodeValue                    :
firstChild                   : System.__ComObject
lastChild                    : System.__ComObject
previousSibling              :
nextSibling                  : System.__ComObject
ownerDocument                : mshtml.HTMLDocumentClass
role                         :
ariaBusy                     :
ariaChecked                  :
ariaDisabled                 :
ariaExpanded                 :
ariaHaspopup                 :
ariaHidden                   :
ariaInvalid                  :
ariaMultiselectable          :
ariaPressed                  :
ariaReadonly                 :
ariaRequired                 :
ariaSecret                   :
ariaSelected                 :
ie8_attributes               :
ariaValuenow                 :
ariaPosinset                 :
ariaSetsize                  :
ariaLevel                    :
ariaValuemin                 :
ariaValuemax                 :
ariaControls                 :
ariaDescribedby              :
ariaFlowto                   :
ariaLabelledby               :
ariaActivedescendant         :
ariaOwns                     :
ariaLive                     :
ariaRelevant                 :
ie9_tagName                  :
ie9_nodeName                 :
onabort                      :
oncanplay                    :
oncanplaythrough             :
onchange                     :
ondurationchange             :
onemptied                    :
onended                      :
onerror                      :
oninput                      :
onload                       :
onloadeddata                 :
onloadedmetadata             :
onloadstart                  :
onpause                      :
onplay                       :
onplaying                    :
onprogress                   :
onratechange                 :
onreset                      :
onseeked                     :
onseeking                    :
onselect                     :
onstalled                    :
onsubmit                     :
onsuspend                    :
ontimeupdate                 :
onvolumechange               :
onwaiting                    :
constructor                  : System.__ComObject
onmspointerdown              :
onmspointermove              :
onmspointerup                :
onmspointerover              :
onmspointerout               :
onmspointercancel            :
onmspointerhover             :
onmslostpointercapture       :
onmsgotpointercapture        :
onmsgesturestart             :
onmsgesturechange            :
onmsgestureend               :
onmsgesturehold              :
onmsgesturetap               :
onmsgesturedoubletap         :
onmsinertiastart             :
onmstransitionstart          :
onmstransitionend            :
onmsanimationstart           :
onmsanimationend             :
onmsanimationiteration       :
oninvalid                    :
xmsAcceleratorKey            :
spellcheck                   : True
onmsmanipulationstatechanged :
oncuechange                  :
rowSpan                      : 1
colSpan                      : 1
align                        :
vAlign                       :
bgColor                      :
noWrap                       : False
background                   :
borderColor                  :
borderColorLight             :
borderColorDark              :
width                        :
height                       :
cellIndex                    : 0
abbr                         :
axis                         :
ch                           :
chOff                        :
headers                      :
scope                        :
ie9_ch                       :
ie9_chOff                    :

System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject
System.__ComObject

There are 86 entries returned by that request. Whenever I try to iterate over the list, I get considerably fewer responses:

$ie.Document.getElementsByTagName("td") | foreach-object {write-output $_.tagName}
TD
TD
TD
TD
TD
TD
TD
TD
TD
TD
TD
TD
TD
TD

All 86 responses have a tagName of "TD" (they have to, that's a requirement for their return from the function call before). For some reason it's not looping over all the objects returned by the getElementsByTagName() call and I don't understand why. Am I missing something here? The response object if I store the query in a variable is itself a comobject so I don't know if there are special rules:

$whatpage = $ie.Document.getElementsByTagName("td")
$whatpage
System.__ComObject

edit: Here's the html from the page

http://pastebin.com/embed_js.php?i=qA9wJuBY

Derek Chadwell
  • 696
  • 1
  • 8
  • 22
  • Can you provide source HTML against which we can try some of this code? If your `$ie` object gets a public web page, maybe you can just post the full code? – briantist Oct 01 '15 at 19:23
  • edited original to contain link to html example http://pastebin.com/embed_js.php?i=qA9wJuBY – Derek Chadwell Oct 01 '15 at 19:45
  • 1
    As usual, my suggestion would be to use HtmlAgilityPack library (specifically `DocumentNode.SelectNodes('\\td')`). In the past I, too, tried to use IE automation but pretty much every time it was broken or at least unreliable (this includes `Invoke-WebRequest` which actually uses IE engine under the hood to parse HTML.) – Alexander Obersht Oct 01 '15 at 20:43
  • @AlexanderObersht agreed, HtmlAgilityPack is really good, works great with powershell. – briantist Oct 01 '15 at 21:19

1 Answers1

0

What you are getting as result is obvious behavior. If you observe the value for TagName property it is definitely going to be TD because you are getting the results from getElementsByTagName("td") method, so it is going to return all the elements where tag name is TD.

Now, if you are looking for values inside the tag (TD or table data) to return the values from the table cell then you should use $whatpage | %{$_.InnerText} or $ie.Document.getElementsByTagName("td") | foreach-object {write-output $_.InnerText}

Hope that helps!

SavindraSingh
  • 878
  • 13
  • 39
  • No it doesn't help, at all. The point of my post is not that the out says TD it's that the output doesn't loop as many times as there are comObjects despite me looping over the list of comobjects. – Derek Chadwell Nov 19 '15 at 14:54