While coding a python-based script, i met with a strange html_table which like:
<tr>
<td x:str="2020-09-27 18:36:05"></td>
<td x:str="SMS"></td>
<td x:str="AAA"></td>
<td x:str="10658139"></td>
</tr>
I know I can use MS Excel to convert it to a normal .xls or .xlsx file, but I have too many this kind of files to convert. So I need coding a script to finish the hard job. I have tried to use pandas to handle it, but pandas can not recoginze the data from the file correctly.
I guess maybe VBA can handle this problem well, but what I am familiar with is just Python. So can anybody tell me which python library can be used to handle this kind of html-based data table?
Any advice would be much appreciated.
In fact I have found out an evil way to solve the problem using re. some code like:
f=re.sub(r'\sx\:str=\"(.+)\">', r">\1",f)
But it looks like too violent.