0
  1. I have a csv that may or may not always have the same column order or headers (hence, the need to call columns by name).
  2. I use pandas to convert the spreadsheet into a dataframe
  3. i modify the dataframe based upon (hopefully) modular, scalable booleans
  4. i reconstruct the csv file:

    df.to_csv('csv/csv_out.csv')

I have been able to load the csv into a dataframe and conditionally change a column header (using the following:

df = pd.read_csv('csv/csv_in.csv', encoding='utf-8', engine='python', na_values=['.'])

if 'Column Title A' in df.columns:
df.rename(columns = {'Column Title A':'Column Title B'}, inplace = False)

but i cant seem to understand the syntax involved when chaining multiple if/then statements involving both text and numerical values.

Ive attempted df.ix;

dfmod_swheader = df.ix[(df['Column A'] == '12345') | (df['Column A'] == '54321') & 'Column Title B' in df.columns:]

but consistently receive "invalid type comparison" that I can't quite understand enough to get around.

So, the ask:

can I get assistance on the proper syntax for a pandas dataframe transform that achieves the following:

take the loaded dataframe (from csv_in.csv):

Column Name  A,Column Name  B,Column Name Red,Column Name Blue
foo,234,bar,foobar
foo,Box,bar,foobar
foo,12,bar,foobar
foo,9,bar,foobar
foo,1,bar,foobar
foo,Frog,bar,foobar
foo,5567,bar,foobar
foo,12,bar,foobar
foo,a,bar,foobar

and transform it (into csv_out.csv):

Column Name  A,Column Name C,Column Name  B,Column Name Red,Column Name Blue
foo,,234,bar,foobar
foo,,Box,bar,foobar
foo,,12,bar,foobar
foo,,9,bar,foobar
,foo,1,bar,foobar
foo,,Frog,bar,foobar
foo,,5567,bar,foobar
foo,,12,bar,foobar
,foo,a,bar,foobar

using the following logic:

if 'column name a' exists and
'column name b has cell value' = 'numerical value 1' or 'text value a'
then move 'column name a cell value' to (new?) 'column name c',
placing 'column name c' directly to the right of 'column name a'
macphail
  • 55
  • 1
  • 7
  • Maybe you could add some sample data and expected output to this question. – Scott Boston May 16 '18 at 18:19
  • Could you print out like 10 lines of a dataframe including headers and index, then print out your expected output dataframe in the question. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples?answertab=oldest#tab-top – Scott Boston May 16 '18 at 18:44
  • @Scott : I've cleaned for clarity, thanks for the input – macphail May 17 '18 at 17:34

1 Answers1

0

I think you can use something like this to move column A to a new column under a condition.

cond  = df_in['Column Name B'].isin(['1','a'])

df_in['Column Name C'] = df_in.loc[cond,'Column Name A']
df_in['Column Name A'] = df_in['Column Name A'].mask(cond)

Output:

  Column Name A Column Name B Column Name C
0           foo             1           NaN
1           bar             a           NaN
2           NaN             2        foobar
3           NaN             b        barfoo

My output:

  Column Name  A Column Name  B Column Name Blue Column Name Red Column Name  C
0            foo            234           foobar             bar            NaN
1            foo            Box           foobar             bar            NaN
2            foo             12           foobar             bar            NaN
3            foo              9           foobar             bar            NaN
4            NaN              1           foobar             bar            foo
5            foo           Frog           foobar             bar            NaN
6            foo           5567           foobar             bar            NaN
7            foo             12           foobar             bar            NaN
8            NaN              a           foobar             bar            foo
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • nice. it was actually; `Column A, C, B` `nan,foo,1` `nan,bar,a` `foobar, nan, 2` `barfoo, nan, b` but youve given me the syntax. assuming im smart enough to figure it out... tnx! – macphail May 17 '18 at 20:12
  • interesting. as written, the output creates Column C (at the far end of the table, but I'm ok with that), however none of Column A's data transfers. Thoughts? (and thanks again) – macphail May 17 '18 at 20:42
  • Yeah it is doing alphabetical ordering. you can force ordering using 'df_in[['Col A','Col C','Col B']].to_csv()'. And, sure to populate column C first then column A reduction. – Scott Boston May 17 '18 at 20:45
  • i dont mind the ordering, i suppose i can deal with end-of-table... but the data not moving is bad. i modified the post csv examples of in and out. based on those examples, the script changes gave me: `Column Name A,Column Name B,Column Name Red,Column `Name Blue,Column Name C` `foo,234,bar,foobar,` `foo,Box,bar,foobar,` `foo,12,bar,foobar,` `foo,9,bar,foobar,` `foo,1,bar,foobar,` `foo,Frog,bar,foobar,` `foo,5567,bar,foobar,` `foo,12,bar,foobar,` `foo,a,bar,foobar,` ... and i apologize for not being able to format that right. – macphail May 17 '18 at 21:27
  • Run that script now against your data. – Scott Boston May 17 '18 at 21:36
  • i cant seem to get past a key error, even when running on test csv as you actually wrote it. `Traceback (most recent call last):` ... `KeyError: 'Column Name B'` ...thanks again. – macphail May 18 '18 at 21:24
  • Does your "Column name B' have one space or two spacaes between 'name' and' 'b'? – Scott Boston May 18 '18 at 22:22
  • one. but let me audit everything just to be sure. tnx – macphail May 18 '18 at 23:58
  • KeyError means that your column name is not matching. The error message is saying that it can't not find a column named 'Column Name B'. – Scott Boston May 29 '18 at 15:58
  • 1
    that did it. found my problem. thank you for your effort. – macphail May 29 '18 at 15:59