2

I am having an XML data which also contains HTML data. I'm trying to dump this XML data to one cell in a csv file which also contains other columns. Right now, it is splitting itself and coming in different(adjacent) cells. Therefore reading the csv using pandas throws an error

Error tokenizing data. C error: Expected 94 fields in line 3, saw 221

I also looked into a similar scenario. But it didn't help because it was from a database. Therefore the workaround functionalities will be different.

I am not looking to parse the XML data. I just want to save the entire XML data into one cell in a csv file.

Moreover, I cannot share the data snapshot for confidentiality reasons but I hope the issue is conveyed.

Any help is appreciated.

Eswar
  • 1,201
  • 19
  • 45

2 Answers2

2

you can use built in csv package, try wrapping the xml as a string inside of a list:

import csv

xml = ["""<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
</catalog>"""]

with open("test.csv", "w", encoding="utf8") as out_file:
    writer = csv.writer(out_file)
    writer.writerow(xml)

You should then be able to read it with pandas.

Token Joe
  • 177
  • 1
  • 9
1
import pandas as pd


with open('note.xml', 'r') as f:
    data = f.read()

df = pd.DataFrame(data = {'xml_file': [data]})

df.to_csv('xml_as_csv.csv')
Dariusz Krynicki
  • 2,544
  • 1
  • 22
  • 47