I need to import content from WordPress into Plone, a Python-based CMS, and I have a dump of the posts table as a huge CSV vanilla file using ";" as a delimiter.
The problem is the standard CSV reader from the csv module is not smart enough to parse the HTML content inside a row (the post_content
field).
For instance, when the parser encounters something like <p> </p>
it interprets the semicolon as a field delimiter and I end up with more items than fields and with fields with wrong content.
Is there any other option to solve this kind of issues? Processing the row with a regex seems pretty scary to me.