In Pandas, I have a dataframe, written from a csv. My end goal is to generate an XML schema from that CSV, because each of the items in the CSV correspond to a schema variable. The only solution (that I could think of) would be to read each item from that dataframe so that it generates a text file, with each value in the dataframe surrounded by a string.
TableName Variable Interpretation Col4 Col5
CRASH CRASH_ID integer 1
CRASH SER_NO range 0
CRASH SER_NO code 99999
CRASH CRASH_MO_NO code 1 January
CRASH CRASH_MO_NO code 2 February
Which would generate a text file that results in something along the lines of (using the first row as an example):
<table = "CRASH">
<name = "CRASH_ID">
<type = "integer">
<value = "1">
Where <table = >, <name = >
, are all strings. They don't have to be formatted that way specifically (although that would be nice)-- I just need a faster way to generate this schema than typing it all out by hand from a CSV file.
It seems like the best way to do that would be to read through each row and generate a string while writing it to the output file. I've looked at the .iterrows() method, but that doesn't let me concatenate strings and tuples. I've also looked at some posts from other users, but their focus seems to be more on calculating things within dataframes, or changing the data itself, rather than generating a string from each row.
My current code is below. I understand that pandas is based off Numpy arrays, and that running "for i in df" loops is not an efficient method, but I am not really sure where to start.
EDIT: Some of the rows might need to loop through to display a certain way. For instance, the schema has multiple value codes that have strings attached:
<values>
<value code = "01">January</value>
<value code = "02">February</value>
<value code = "03">March</value>
</values>
I am thinking maybe I could group the values by "interpretation"? And then, if they have the "code" interpretation, I could do some kind of iteration through the group so that it displayed all the codes.
Here is my current code, for reference. I have updated it to reflect Randy's excellent suggestion below. I have also edited the above post to reflect some updated concerns.
import pandas as pd
text_file = open(r'oregon_output.txt', 'w')
df = pd.read_csv(r'oregon_2013_var_list.csv')
#selects only CRASH variables
crash = df['Col1'] == 'CRASH'
df_crash = df[crash]
#value which will be populated with code values from codebook
code_fill = " "
#replaces NaN values in dataframe wih code_fill
df_crash.fillna(code_fill, inplace = True)
for row_id, row in df.iterrows():
print '<variable>'
for k, v in row.iterkv():
if v is not None:
print '<{} = "{}">'.format(k, v)
print '</variable>'
print