0

I'm Using Beautifulsoup to turn the data in this table to a json. However, how do I get the data between the tags?

<table>
<tr>
    <th>Montag</th>
    <td>
     09:00 &ndash; 00:30
    </td>
</tr>
<tr>
  <th>Dienstag</th>
  <td>
   geschlossen
  </td>
</tr>
<tr>
  <th>Mittwoch</th>
  <td>
  12:00 &ndash; 00:30
  </td>
</tr>
<tr>
  <th>Donnerstag &ndash; Sonntag</th>
  <td>
    09:00 &ndash; 00:30
  </td>
</tr>
</table>

Unfortunately, this is not working:

datesTable = BeautifulSoup(mytable)

for row in datesTable: 
   print(row['th'])
Alex
  • 5,759
  • 1
  • 32
  • 47

2 Answers2

1

Here is an example. See this question on how to decode the escaped HTML strings according to your version of python.

table = """
<table>
<tr>
    <th>Montag</th>
    <td>
     09:00 &ndash; 00:30
    </td>
</tr>
<tr>
  <th>Dienstag</th>
  <td>
   geschlossen
  </td>
</tr>
<tr>
  <th>Mittwoch</th>
  <td>
  12:00 &ndash; 00:30
  </td>
</tr>
<tr>
  <th>Donnerstag &ndash; Sonntag</th>
  <td>
    09:00 &ndash; 00:30
  </td>
</tr>
</table>"""

import json
from bs4 import BeautifulSoup

soup = BeautifulSoup(table, 'html5lib')

data = {}

for row in soup.findAll('tr'):
    th = row.find('th')
    td = row.find('td')
    data[th.text.strip()] = td.text.strip()

print(json.dumps(data))
Community
  • 1
  • 1
Alden
  • 2,229
  • 1
  • 15
  • 21
  • Note: you are using the `BeautifulSoup` version 3 which is very outdated and not maintained. – alecxe Jan 16 '17 at 18:11
0

Taking into account your actual problem statement of converting an HTML to JSON, you can use pandas.read_html() to read HTML into a DataFrame and then convert it to a dictionary:

import pandas as pd

data = """
your HTML abbreviated to save space
"""

df = pd.read_html(data)[0]
df.columns = ["label", "value"]
print(dict(zip(df.label, df.value)))

Prints:

{'Montag': '09:00 – 00:30', 
 'Dienstag': 'geschlossen', 
 'Mittwoch': '12:00 – 00:30', 
 'Donnerstag – Sonntag': '09:00 – 00:30'}

You can then use json.dumps() to further dump the dictionary into a JSON string.

There is also .to_json() method that can directly dump a DataFrame to JSON, but I have not figured out how to use it in this particular case.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Even though pandas will probably do the job, it is a bit too much for such a simple task, and I'm using Beautifulsoup already in this project (to find the table in a html file). thanks anyway. – Alex Jan 16 '17 at 19:42
  • @lxer sure, no problem, it's just another instrument to solve the problem. – alecxe Jan 16 '17 at 19:59