1

I have a particular information I am looking for in the financial market, unfortunately current filters don't allow me to do that. I have an alternative in google sheets where I can go directly into every stock and check it individually. I can use importhtml and to an extent it works but I wanted to use importXML instead in case the location of the table changes.

Example The website I am using here is the Financial Times

here is a snippet of the page source:

<span>Sectors</span><i class="o-ft-icons-icon o-ft-icons-icon--arrow-down"></i>
    <ul class="mod-ui-tab-row mod-ui-tab-row--dropdown" role="tablist">
        <li aria-controls="sectors-panel" aria-selected="true" class="mod-ui-tab mod-ui-tab__module-header" role="tab">Sectors</li>
        <li aria-controls="regions-panel" aria-selected="false" class="mod-ui-tab mod-ui-tab__module-header" role="tab">Regions</li>
    </ul>
    <div class="mod-module__content">
        <div aria-hidden="false" class="mod-ui-tab-content" id="sectors-panel" role="tabpanel">
            <div>
                <div aria-hidden="false" class="mod-weightings__sectors">
                    <div class="mod-weightings__sectors__chart">
                        <div class="mod-weightings__sectors__chart--dynamic mod-ui-chart--dynamic"></div>
                    </div>
                    <div class="mod-weightings__sectors__table">
                        <table class="mod-ui-table mod-ui-table--colored">
                            <thead>
                                <tr>
                                    <th class="mod-ui-table__header--text">Sector</th>
                                    <th>% Net assets</th>
                                    <th>Category average</th>
                                </tr>
                            </thead>
                            <tbody>
                                <tr>
                                    <td class="mod-ui-table__cell--colored"><span class="mod-ui-table__cell--colored__wrapper" style="border-color:#27757B;">Technology</span></td>
                                    <td>88.04%</td>
                                    <td>73.78%</td>
                                </tr>
                                <tr>
                                    <td class="mod-ui-table__cell--colored"><span class="mod-ui-table__cell--colored__wrapper" style="border-color:#EEA45F;">Financial Services</span></td>
                                    <td>8.38%</td>
                                    <td>3.74%</td>
                                </tr>
                                <tr>
                                    <td class="mod-ui-table__cell--colored"><span class="mod-ui-table__cell--colored__wrapper" style="border-color:#94826B;">Industrials</span></td>
                                    <td>2.72%</td>
                                    <td>6.28%</td>
                                </tr>
                                <tr>
                                    <td class="mod-ui-table__cell--colored"><span class="mod-ui-table__cell--colored__wrapper" style="border-color:#EED485;">Consumer Cyclical</span></td>
                                    <td>0.67%</td>
                                    <td>5.98%</td>
                                </tr>
                                <tr>
                                    <td class="mod-ui-table__cell--colored"><span class="mod-ui-table__cell--colored__wrapper" style="border-color:#A6A371;">Healthcare</span></td>
                                    <td>0.00%</td>
                                    <td>2.93%</td>
                                </tr>
                                <tr>
                                    <td class="mod-ui-table__cell--colored"><span class="mod-ui-table__cell--colored__wrapper" style="border-color:#819E9A;">Communication Services</span></td>
                                    <td>0.00%</td>
                                    <td>2.36%</td>
                                </tr>
                                <tr>
                                    <td class="mod-ui-table__cell--colored"><span class="mod-ui-table__cell--colored__wrapper" style="border-color:#746E7F;">Real Estate</span></td>
                                    <td>0.00%</td>
                                    <td>0.17%</td>
                                </tr>
                                <tr>
                                    <td class="mod-ui-table__cell--colored"><span class="mod-ui-table__cell--colored__wrapper" style="border-color:#73A5C3;">Consumer Defensive</span></td>
                                    <td>0.00%</td>
                                    <td>0.08%</td>
                                </tr>
                            </tbody>
                        </table>
                    </div>
                    <div class="mod-disclaimer">
                        As of Dec 19 2017. Sectors weighting is calculated using only long position holdings of the portfolio.
                    </div>
                </div>
            </div>
        </div>
    </div>

I want to get the first value (%) after financial services or with anything named finance in the sector table. In this case the value would be 8.38%

mfaiz
  • 475
  • 1
  • 7
  • 17
  • Have you tried a google search for web scrapping? – Jason Allshorn Jan 10 '18 at 07:47
  • I don't know how a free tool would be able to do what I want. I have over a 1000 lines in my spreadsheet with the link its not just one website that i need. – mfaiz Jan 10 '18 at 08:00
  • This post explains well about how to search through HTML which starts out as a string as it might do in your case. https://stackoverflow.com/a/46817154/5086349 – Jason Allshorn Jan 10 '18 at 08:16
  • I'd like to use the internal google functions if at all possible all i need it the xpath syntax – mfaiz Jan 10 '18 at 09:28

1 Answers1

2

The xpath you're looking for should be something like this:

//span[text()="Financial Services"]/parent::td/following-sibling::td[1]/text()
  • First finding the span that has the exact content Financial Services.
  • Then getting its parent (td) node.
  • Selecting the first td node.
  • Acquiring the content (text()) from this node
Casper
  • 1,435
  • 10
  • 22
  • worked like a charm. Thanks. What if i want to have "Financial Services" or "Finance - General" as the term. In other words how can I get it to give me all things Finance related, like searching for the text *financ* ( i have tried this and it doesn't seem to work ) – mfaiz Jan 10 '18 at 10:29
  • I guess this will always give you the first result and the xpath is very specific to acquire the information. If the "Finance - General" is structured slightly different, the xpath may fail. So I can't give you an answer with these details. You could replace `Financial Services` with `Industrials`. It checks for an exact match, so you can't say `[text()="Financial Service"]` because you would be missing the last `s` and therefore it would not match. You could use [contains()](http://www.protechskills.com/testing/automation-testing/selenium/usage-contains-starts-functions-xpath) for this. – Casper Jan 10 '18 at 13:30