I've done a bit of looking around and found various bits and pieces relating to this, but nothing concrete.
I need to find a method of extracting UI elements other than that of the Spy++ tool. I'm able to locate screen items and their underlying text captions based on HWND, however 3rd party apps such as Firefox offer further problems as they only have one large window for the display. If anyone has any ideas on how to natively get screen coordinates to do an OCR or control recognition of UI elements within, say, a web page I'd love to hear from you.