0

I know it might be a duplicate but I am not able to extract a value from this HTML source. Any help would be greatly appreciated.

So what I am trying to do is get the pid of the project from page. The names of the project are being read from a csv file and I need to get the pid.

For example if the project here is "AA project", just the project key "AA" can also be used, the pid that needs to be extracted is 10441.

Since the values are not a label, I cannot figure out how to extract these.

Update : just using pid=(\d....) gives all the pid without any reference to the project name or key.

 <table id="project-list" class="aui">
        <thead>
            <tr>
                <th></th>
                <th>Name</th>
                <th>Key</th>

                    <th class="project-list-type">Project Type</th>

                <th>URL</th>
                <th>Project Lead</th>
                <th>Default Assignee</th>
                <th>Operations</th>
            </tr>
        </thead>
        <tbody>

                <tr data-project-key="AA">
                    <td class="cell-type-icon" data-cell-type="avatar">
                        <div class="aui-avatar aui-avatar-small aui-avatar-project jira-system-avatar"><span class="aui-avatar-inner"><img src="/secure/projectavatar?pid=10441&amp;amp;avatarId=10011&amp;amp;size=small" alt="Project Avatar for 10441" /></span></div>
                    </td>
                    <td data-cell-type="name">
                        <a id="view-project-10441" href="/plugins/servlet/project-config/AA/summary">AA project</a>
                    </td>
                    <td data-cell-type="key">AA</td>


                            <span>Software</span>
                        </td>

                    <td class="cell-type-url" data-cell-type="url">

                            No URL


                    </td>
                    <td class="cell-type-user" data-cell-type="lead">

                            <a class="user-hover" rel="localadmin" id="view_AA_projects_localadmin" href="/secure/ViewProfile.jspa?name=localadmin">Atlassian Administrator</a>


                    </td>
                    <td class="cell-type-user" data-cell-type="default-assignee">

                        Unassigned

                    </td>
                    <td data-cell-type="operations">
                        <ul class="operations-list">

                            <li><a class="edit-project" id="edit-project-10441" href="/secure/project/EditProject!default.jspa?pid=10441&amp;returnUrl=ViewProjects.jspa">Edit</a></li>


                            <li><a id="change_project_type_10441" class="change-project-type-link" data-project-id="10441" href="#">Change project type</a></li>


                            <li><a id="delete_project_10441" href="/secure/project/DeleteProject!default.jspa?pid=10441&amp;returnUrl=ViewProjects.jspa">Delete</a></li>

                        </ul>
                    </td>
                </tr>

                <tr data-project-key="AAL">
                    <td class="cell-type-icon" data-cell-type="avatar">
                        <div class="aui-avatar aui-avatar-small aui-avatar-project jira-system-avatar"><span class="aui-avatar-inner"><img src="/secure/projectavatar?pid=10442&amp;amp;avatarId=10011&amp;amp;size=small" alt="Project Avatar for 10442" /></span></div>
                    </td>
                    <td data-cell-type="name">
                        <a id="view-project-10442" href="/plugins/servlet/project-config/AAL/summary">AAL project</a>
                    </td>
                    <td data-cell-type="key">AAL</td>

                        <td class="cell-type-project-type">

                            <span>Software</span>
                        </td>

                    <td class="cell-type-url" data-cell-type="url">

                            No URL


                    </td>
                    <td class="cell-type-user" data-cell-type="lead">

                            <a class="user-hover" rel="localadmin" id="view_AAL_projects_localadmin" href="/secure/ViewProfile.jspa?name=localadmin">Atlassian Administrator</a>


                    </td>
                    <td class="cell-type-user" data-cell-type="default-assignee">

                        Unassigned

                    </td>
                    <td data-cell-type="operations">
                        <ul class="operations-list">

1 Answers1

0

I wouldn't recommend using regular expressions to parse HTML data as it will be a headache to develop and maintain and it will be very sensitive to markup changes hence very fragile, see https://stackoverflow.com/a/1732454/2897748 for details.

Go for XPath Extractor instead, the relevant configuration would be:

  • Reference Name: anything meaningful, i.e. id
  • XPath Query: substring-after(//tr[@data-project-key='AA']/td[@data-cell-type='name']/a/@id,'view-project-')
  • Check Use Tidy if your response is not XHTML-compliant

Demo:

XPath Extractor Demo

References:

Community
  • 1
  • 1
Dmitri T
  • 159,985
  • 5
  • 83
  • 133