19

I'm trying to convert a table I have extracted via BeautifulSoup into JSON.

So far I've managed to isolate all the rows, though I'm not sure how to work with the data from here. Any advice would be very much appreciated.

[<tr><td><strong>Balance</strong></td><td><strong>$18.30</strong></td></tr>, 
<tr><td>Card name</td><td>Name</td></tr>, 
<tr><td>Account holder</td><td>NAME</td></tr>, 
<tr><td>Card number</td><td>1234</td></tr>, 
<tr><td>Status</td><td>Active</td></tr>]

(Line breaks mine for readability)

This was my attempt:

result = []
allrows = table.tbody.findAll('tr')
for row in allrows:
    result.append([])
    allcols = row.findAll('td')
    for col in allcols:
        thestrings = [unicode(s) for s in col.findAll(text=True)]
        thetext = ''.join(thestrings)
        result[-1].append(thetext)

which gave me the following result:

[
 [u'Card balance', u'$18.30'],
 [u'Card name', u'NAMEn'],
 [u'Account holder', u'NAME'],
 [u'Card number', u'1234'],
 [u'Status', u'Active']
]
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
declanjscott
  • 351
  • 1
  • 2
  • 10

1 Answers1

36

Probably your data is something like:

html_data = """
<table>
  <tr>
    <td>Card balance</td>
    <td>$18.30</td>
  </tr>
  <tr>
    <td>Card name</td>
    <td>NAMEn</td>
  </tr>
  <tr>
    <td>Account holder</td>
    <td>NAME</td>
  </tr>
  <tr>
    <td>Card number</td>
    <td>1234</td>
  </tr>
  <tr>
    <td>Status</td>
    <td>Active</td>
  </tr>
</table>
"""

From which we can get your result as a list using this code:

from bs4 import BeautifulSoup
table_data = [[cell.text for cell in row("td")]
                         for row in BeautifulSoup(html_data)("tr")]

To convert the result to JSON, if you don't care about the order:

import json
print json.dumps(dict(table_data))

Result:

{
    "Status": "Active",
    "Card name": "NAMEn",
    "Account holder":
    "NAME", "Card number": "1234",
    "Card balance": "$18.30"
}

If you need the same order, use this:

from collections import OrderedDict
import json
print json.dumps(OrderedDict(table_data))

Which gives you:

{
    "Card balance": "$18.30",
    "Card name": "NAMEn",
    "Account holder": "NAME",
    "Card number": "1234",
    "Status": "Active"
}
László Papp
  • 51,870
  • 39
  • 111
  • 135
H.D.
  • 4,168
  • 1
  • 18
  • 15
  • 1
    Thanks a lot, I was getting an error which was due to the encoding of some characters in the server's response, once I figured that out your answer worked perfectly. Thanks again, and have a great day. – declanjscott Aug 31 '13 at 05:30