I have a huge HTML Table (about 500,000 rows) that I need to transform into a JSON file. The table looks something like this:
<table>
<tr>
<th>Id</th>
<th>Timestamp</th>
<th>Artist_Name</th>
<th>Tweet_Id</th>
<th>Created_at</th>
<th>Tweet</th>
<th>User_name</th>
<th>User_Id</th>
<th>Followers</th>
</tr>
<tr>
<td>1</td>
<td>2013-06-07 16:00:17</td>
<td>Kelly Rowland</td>
<td>343034567793442816</td>
<td>Fri Jun 07 15:59:48 +0000 2013</td>
<td>So has @MissJia already discussed this Kelly Rowland Dirty Laundry song? I ain't trying to go all through her timelime...</td>
<td>Nicole Barrett</td>
<td>33831594</td>
<td>62</td>
</tr>
<tr>
<td>2</td>
<td>2013-06-07 16:00:17</td>
<td>Kelly Rowland</td>
<td>343034476395368448</td>
<td>Fri Jun 07 15:59:27 +0000 2013</td>
<td>RT @UrbanBelleMag: While everyone waits for Kelly Rowland to name her abusive ex, don't hold your breath. But she does say he's changed: ht…</td>
<td>A.J.</td>
<td>24193447</td>
<td>340</td>
</tr>
I would like to create a JSON file that looks sth like that:
{'data': [
{
'text': 'So has @MissJia already discussed this Kelly Rowland Dirty Laundry song? I ain't trying to go all through her timelime...',
'id': 1,
'tweet_id': 343034567793442816
},
{
'text': 'RT @UrbanBelleMag: While everyone waits for Kelly Rowland to name her abusive ex, don't hold your breath. But she does say he's changed: ht…',
'id': 2,
'tweet_id': 343034476395368448
}
]}
Maybe with some more of the variables included but that should be self explaining.
I have already looked into several options but mostly I have the problem that my HTML Table is so big. I saw a lot of people recommending jQuery. Does that make sense for me considering the size of my table? If there is a suitable Python option I would be pretty much in favor as I have been writing most of my code so far in Python.