python: chaining complex conditional dataframe changes using pandas

Question

I have a csv that may or may not always have the same column order or headers (hence, the need to call columns by name).
I use pandas to convert the spreadsheet into a dataframe
i modify the dataframe based upon (hopefully) modular, scalable booleans
i reconstruct the csv file:

df.to_csv('csv/csv_out.csv')

I have been able to load the csv into a dataframe and conditionally change a column header (using the following:

df = pd.read_csv('csv/csv_in.csv', encoding='utf-8', engine='python', na_values=['.'])

if 'Column Title A' in df.columns:
df.rename(columns = {'Column Title A':'Column Title B'}, inplace = False)

but i cant seem to understand the syntax involved when chaining multiple if/then statements involving both text and numerical values.

Ive attempted df.ix;

dfmod_swheader = df.ix[(df['Column A'] == '12345') | (df['Column A'] == '54321') & 'Column Title B' in df.columns:]

but consistently receive "invalid type comparison" that I can't quite understand enough to get around.

So, the ask:

can I get assistance on the proper syntax for a pandas dataframe transform that achieves the following:

take the loaded dataframe (from csv_in.csv):

Column Name  A,Column Name  B,Column Name Red,Column Name Blue
foo,234,bar,foobar
foo,Box,bar,foobar
foo,12,bar,foobar
foo,9,bar,foobar
foo,1,bar,foobar
foo,Frog,bar,foobar
foo,5567,bar,foobar
foo,12,bar,foobar
foo,a,bar,foobar

and transform it (into csv_out.csv):

Column Name  A,Column Name C,Column Name  B,Column Name Red,Column Name Blue
foo,,234,bar,foobar
foo,,Box,bar,foobar
foo,,12,bar,foobar
foo,,9,bar,foobar
,foo,1,bar,foobar
foo,,Frog,bar,foobar
foo,,5567,bar,foobar
foo,,12,bar,foobar
,foo,a,bar,foobar

using the following logic:

if 'column name a' exists and
'column name b has cell value' = 'numerical value 1' or 'text value a'
then move 'column name a cell value' to (new?) 'column name c',
placing 'column name c' directly to the right of 'column name a'

Maybe you could add some sample data and expected output to this question. — Scott Boston, May 16 '18 at 18:19
Could you print out like 10 lines of a dataframe including headers and index, then print out your expected output dataframe in the question. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples?answertab=oldest#tab-top — Scott Boston, May 16 '18 at 18:44

Scott Boston · Accepted Answer · 2018-05-17T21:33:58.493

0

I think you can use something like this to move column A to a new column under a condition.

cond  = df_in['Column Name B'].isin(['1','a'])

df_in['Column Name C'] = df_in.loc[cond,'Column Name A']
df_in['Column Name A'] = df_in['Column Name A'].mask(cond)

Output:

  Column Name A Column Name B Column Name C
0           foo             1           NaN
1           bar             a           NaN
2           NaN             2        foobar
3           NaN             b        barfoo

My output:

  Column Name  A Column Name  B Column Name Blue Column Name Red Column Name  C
0            foo            234           foobar             bar            NaN
1            foo            Box           foobar             bar            NaN
2            foo             12           foobar             bar            NaN
3            foo              9           foobar             bar            NaN
4            NaN              1           foobar             bar            foo
5            foo           Frog           foobar             bar            NaN
6            foo           5567           foobar             bar            NaN
7            foo             12           foobar             bar            NaN
8            NaN              a           foobar             bar            foo

edited May 17 '18 at 21:33

answered May 17 '18 at 17:56

Scott Boston

147,308
15
139
187

nice. it was actually; `Column A, C, B` `nan,foo,1` `nan,bar,a` `foobar, nan, 2` `barfoo, nan, b` but youve given me the syntax. assuming im smart enough to figure it out... tnx! – macphail May 17 '18 at 20:12
interesting. as written, the output creates Column C (at the far end of the table, but I'm ok with that), however none of Column A's data transfers. Thoughts? (and thanks again) – macphail May 17 '18 at 20:42
Yeah it is doing alphabetical ordering. you can force ordering using 'df_in[['Col A','Col C','Col B']].to_csv()'. And, sure to populate column C first then column A reduction. – Scott Boston May 17 '18 at 20:45
i dont mind the ordering, i suppose i can deal with end-of-table... but the data not moving is bad. i modified the post csv examples of in and out. based on those examples, the script changes gave me: `Column Name A,Column Name B,Column Name Red,Column `Name Blue,Column Name C` `foo,234,bar,foobar,` `foo,Box,bar,foobar,` `foo,12,bar,foobar,` `foo,9,bar,foobar,` `foo,1,bar,foobar,` `foo,Frog,bar,foobar,` `foo,5567,bar,foobar,` `foo,12,bar,foobar,` `foo,a,bar,foobar,` ... and i apologize for not being able to format that right. – macphail May 17 '18 at 21:27
Run that script now against your data. – Scott Boston May 17 '18 at 21:36
i cant seem to get past a key error, even when running on test csv as you actually wrote it. `Traceback (most recent call last):` ... `KeyError: 'Column Name B'` ...thanks again. – macphail May 18 '18 at 21:24
Does your "Column name B' have one space or two spacaes between 'name' and' 'b'? – Scott Boston May 18 '18 at 22:22
one. but let me audit everything just to be sure. tnx – macphail May 18 '18 at 23:58
KeyError means that your column name is not matching. The error message is saying that it can't not find a column named 'Column Name B'. – Scott Boston May 29 '18 at 15:58
1

that did it. found my problem. thank you for your effort. – macphail May 29 '18 at 15:59

python: chaining complex conditional dataframe changes using pandas

1 Answers1