-4

I am trying to write a script which will turn a series of basic html tables describing particular variations of certain words in different countries into a working spreadsheet for use in a database. Each table applies to the translations of a single word across countries. In html it takes the format of:

<h5><a name="akas"> equivalent names in different countries </a> </h5>
<table border="0" cellpadding="2">

<tr>
<td>character string </td>

<td> country name / country name / country name</td>

</tr>

<tr>
<td>character string </td>

<td>country name</td>

</tr>

.................. this format continues until the table ends

</table>

Country names are repeating across tables and should represent column headings on the spreadsheet across which the rows of equivalent words lie. I am totally new to regex (which I'm finding really bewildering to get into) and a beginner in Javascript also. Again I am looking for help on how to rearrange this type of data into a working spreadsheet for use in a larger database. If anyone could help me it would be really appreciated.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • the question is how i would make a script recognize country names between the even when there are multiple countries as in the above example and place the preceding contents of – user1309067 Apr 02 '12 at 22:51
  • You can write a sed script to extract data from this and create a CSV file. – Kashyap Apr 03 '12 at 18:03

2 Answers2

1

You should look at DOM parsing and XPath. XPath allows you to query the html file to get the content of whichever node that you need.

viper
  • 2,220
  • 5
  • 27
  • 33
  • Parsing HTML with regex... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – aaaidan Apr 02 '12 at 22:58
0

You can copy paste an HTML table into a spreadsheet.

Ruan Mendes
  • 90,375
  • 31
  • 153
  • 217
  • the problem is that the tables dont all match up perfectly, ie sometimes there is more than one country inbetween the and they can be different groupings every time. I want the script to somehow recognize the names of the countries and place the preceding data in the correct spreadsheet cell. also there are several 1000 of these tables im trying to compile. – user1309067 Apr 02 '12 at 22:48