I am not an expert in shell scripting, and I am struggling with finding a way to extract only specific columns from html table. I tried different options awk, grep, hxselect but unfortunately could not come up with solution.
hxselect requires that html is properly formatted which is not always the case for me. Here is the sample table
<table class="jiraIssueTable aui">
<colgroup>
<col width="18">
<col width="90">
<col>
<col width="9%">
<col width="9%">
<col width="9%">
</colgroup>
<thead>
<tr>
<th id="Related issues-type">Type</th>
<th id="Related issues-key">Key</th>
<th id="jiraDetailsText">Summary</th>
<th id="Related issues-status">Status</th>
<th id="Related issues-assignee">Assignee</th>
<th id="Related issues-fix-versions">Fix versions</th>
</tr>
</thead>
<tbody>
<tr class="" >
<td class="jiraIssueIcon" headers="Related issues-type"> <img class="issueTypeImg" src="/images/icons/jira_type_unknown.gif" alt="Unknown Issue Type"> </td>
<td class="jiraIssueKey" headers="Related issues-key"> <a title="View this issue" class="jiraIssueLink" data-issue-key="OL-541" id="viewIssueInJira:OL-541" href="">OL-541</a> </td>
<td headers="jiraDetailsText" class="jiraIssueDetailsError"> Increase the performance </td>
<td class="jiraIssueStatus" headers="Related issues-status"> </td>
<td headers="Related issues-assignee" class="jiraIssueDetailsError"> </td>
<td headers="Related issues-fix-versions" class="jiraIssueDetailsError"> </td>
</tr>
<tr class="" >
<td class="jiraIssueIcon" headers="Related issues-type"> <a href="devStatusDetailDialog=build" title="View this issue"> <img class="issueTypeImg" src="rType=issuetype" alt="Task"/> </a> </td>
<td class="jiraIssueKey" headers="Related issues-key"> <a title="View this issue" class="jiraIssueLink" data-issue-key="IT-2431" id="viewIssueInJira:IT-2431" href="">IT-2431</a> </td>
<td headers="jiraDetailsText" class="jiraIssueDetails"> Get some sample data </td>
<td class="jiraIssueStatus" headers="Related issues-status"> Verified/Closed </td>
<td headers="Related issues-assignee" class="jiraIssueDetails"> User A </td>
<td headers="Related issues-fix-versions" class="jiraIssueDetailsError"> </td>
</tr>
</tbody>
</table>
So from this table I only need 2 and 3 columns contents. Meaning my final results should look like this:
OL-541 Increase the performance
IT-2431 Get some sample data
Any help is appreciated