- I have a csv that may or may not always have the same column order or headers (hence, the need to call columns by name).
- I use pandas to convert the spreadsheet into a dataframe
- i modify the dataframe based upon (hopefully) modular, scalable booleans
i reconstruct the csv file:
df.to_csv('csv/csv_out.csv')
I have been able to load the csv into a dataframe and conditionally change a column header (using the following:
df = pd.read_csv('csv/csv_in.csv', encoding='utf-8', engine='python', na_values=['.'])
if 'Column Title A' in df.columns:
df.rename(columns = {'Column Title A':'Column Title B'}, inplace = False)
but i cant seem to understand the syntax involved when chaining multiple if/then statements involving both text and numerical values.
Ive attempted df.ix;
dfmod_swheader = df.ix[(df['Column A'] == '12345') | (df['Column A'] == '54321') & 'Column Title B' in df.columns:]
but consistently receive "invalid type comparison" that I can't quite understand enough to get around.
So, the ask:
can I get assistance on the proper syntax for a pandas dataframe transform that achieves the following:
take the loaded dataframe (from csv_in.csv):
Column Name A,Column Name B,Column Name Red,Column Name Blue
foo,234,bar,foobar
foo,Box,bar,foobar
foo,12,bar,foobar
foo,9,bar,foobar
foo,1,bar,foobar
foo,Frog,bar,foobar
foo,5567,bar,foobar
foo,12,bar,foobar
foo,a,bar,foobar
and transform it (into csv_out.csv):
Column Name A,Column Name C,Column Name B,Column Name Red,Column Name Blue
foo,,234,bar,foobar
foo,,Box,bar,foobar
foo,,12,bar,foobar
foo,,9,bar,foobar
,foo,1,bar,foobar
foo,,Frog,bar,foobar
foo,,5567,bar,foobar
foo,,12,bar,foobar
,foo,a,bar,foobar
using the following logic:
if 'column name a' exists and
'column name b has cell value' = 'numerical value 1' or 'text value a'
then move 'column name a cell value' to (new?) 'column name c',
placing 'column name c' directly to the right of 'column name a'