-2

I use Python pandas read dataframe like this:

<style type="text/css">
 table.tableizer-table {
  font-size: 12px;
  border: 1px solid #CCC; 
  font-family: Arial, Helvetica, sans-serif;
 } 
 .tableizer-table td {
  padding: 4px;
  margin: 3px;
  border: 1px solid #CCC;
 }
 .tableizer-table th {
  background-color: #104E8B; 
  color: #FFF;
  font-weight: bold;
 }
</style>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>Time</th><th>Angle</th><th>Angle</th><th>Angle</th><th>Angle</th><th>FUEL_1</th><th>FUEL_2</th><th>Speed</th></tr></thead><tbody>
 <tr><td>3:06:38</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>1150</td><td>&nbsp;</td><td>1328</td></tr>
 <tr><td>3:06:39</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1328</td></tr>
 <tr><td>3:06:40</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>&nbsp;</td><td>1150</td><td>1344</td></tr>
 <tr><td>3:06:41</td><td>5.3</td><td>5.6</td><td>5.6</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>1392</td></tr>
 <tr><td>3:06:42</td><td>5.6</td><td>5.6</td><td>5.6</td><td>5.6</td><td>1160</td><td>&nbsp;</td><td>1456</td></tr>
 <tr><td>3:06:43</td><td>5.6</td><td>5.6</td><td>6</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>1520</td></tr>
 <tr><td>3:06:44</td><td>6</td><td>6</td><td>6</td><td>6</td><td>&nbsp;</td><td>1160</td><td>1600</td></tr>
 <tr><td>3:06:45</td><td>6</td><td>6</td><td>6</td><td>6.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1696</td></tr>
</tbody></table>

I want to create the following dataframe:

<style type="text/css">
 table.tableizer-table {
  font-size: 12px;
  border: 1px solid #CCC; 
  font-family: Arial, Helvetica, sans-serif;
 } 
 .tableizer-table td {
  padding: 4px;
  margin: 3px;
  border: 1px solid #CCC;
 }
 .tableizer-table th {
  background-color: #104E8B; 
  color: #FFF;
  font-weight: bold;
 }
</style>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>Time</th><th>Angle</th><th>FUEL_1</th><th>FUEL_2</th><th>Speed</th></tr></thead><tbody>
 <tr><td>3:06:38</td><td>5.3</td><td>1150</td><td>&nbsp;</td><td>1328</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:39</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1328</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:40</td><td>5.3</td><td>&nbsp;</td><td>1150</td><td>1344</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:41</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1392</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:42</td><td>5.6</td><td>1160</td><td>&nbsp;</td><td>1456</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:43</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>1520</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:44</td><td>6</td><td>&nbsp;</td><td>1160</td><td>1600</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:45</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>1696</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6.3</td><td>&nbsp;</td><td>&nbsp;</td><td></td></tr>
</tbody></table>

My idea is insert several empty columns by 'Time','FUEL_1','FUEL_2','Speed' and then stack these columns one by one and then merge them. Do you have easier ideas?

1 Answers1

0

So I'm pretty sure there would be an easy way to do this with pandas.read_html but I'm not as familiar as I am with BeautifulSoup.

html = """<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>Time</th><th>Angle</th><th>Angle</th><th>Angle</th><th>Angle</th><th>FUEL_1</th><th>FUEL_2</th><th>Speed</th></tr></thead><tbody>
 <tr><td>3:06:38</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>1150</td><td>&nbsp;</td><td>1328</td></tr>
 <tr><td>3:06:39</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1328</td></tr>
 <tr><td>3:06:40</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>&nbsp;</td><td>1150</td><td>1344</td></tr>
 <tr><td>3:06:41</td><td>5.3</td><td>5.6</td><td>5.6</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>1392</td></tr>
 <tr><td>3:06:42</td><td>5.6</td><td>5.6</td><td>5.6</td><td>5.6</td><td>1160</td><td>&nbsp;</td><td>1456</td></tr>
 <tr><td>3:06:43</td><td>5.6</td><td>5.6</td><td>6</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>1520</td></tr>
 <tr><td>3:06:44</td><td>6</td><td>6</td><td>6</td><td>6</td><td>&nbsp;</td><td>1160</td><td>1600</td></tr>
 <tr><td>3:06:45</td><td>6</td><td>6</td><td>6</td><td>6.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1696</td></tr>
</tbody></table>"""

import pandas as pd
from bs4 import BeautifulSoup

def read_table(html):
  header, matrix = [], []
  bs = BeautifulSoup(html, "html.parser")
  for row in bs.findAll("tr"):
    if(row.find("th")):
      header = [ r.get_text().strip() for r in row.findAll("th") ]
    else: #td
      matrix.append([ r.get_text().strip() for r in row.findAll("td") ])

  df = pd.DataFrame(matrix, columns=header)
  return df

Passing the html you've given into this function will return a panda's dataframe, and then you can select the columns that you want.

df = read_table(html)
df[["Time","FUEL_1","FUEL_2","Speed"]]
      Time FUEL_1 FUEL_2 Speed
0  3:06:38   1150         1328
1  3:06:39                1328
2  3:06:40          1150  1344
3  3:06:41                1392
4  3:06:42   1160         1456
5  3:06:43                1520
6  3:06:44          1160  1600
7  3:06:45                1696
Tony
  • 1,318
  • 1
  • 14
  • 36