-1

I have a data frame df that contains 15 columns (See below)

A_phase_1,A_phase_2,A_phase_3,A_phase_4,A_phase_5, B_phase_1,B_phase_2,B_phase_3,B_phase_4,B_phase_5, C_phase_1,C_phase_2,C_phase_3,C_phase_4,C_phase_5.

While retaining columns with phase_1 for example=A_phase_1, B_phase_1 and C_phase_1,

I want to programmatically do the following:

  1. delete the columns containing phase_2 that is = A_phase_2,B_phase_2,C_phase_2

  2. replace the remainder column names phase_n with phase_n-1. for example,

all phase_3 become phase_2;

phase_4 --> phase_3;

phase_5 --> phase_4

So desired output should look like this :

A_phase_1,A_phase_2,A_phase_3,A_phase_4,

B_phase_1,B_phase_2,B_phase_3,B_phase_4,

C_phase_1,C_phase_2,C_phase_3,C_phase_4.

Thanks in advance!

@seaBean I got this error

enter image description here

RayX500
  • 247
  • 2
  • 10
  • 1
    Can you provide an example of what you want? https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – ifly6 Jun 29 '21 at 21:14

3 Answers3

2

You can use .filter() to filter the columns to delete. Then, use df.columns.map() with lambda function to rename the columns, as follows:

df = df.drop(df.filter(like='phase_2').columns, axis=1)

df.columns = df.columns.map(lambda x: (x[:-1] + str(int(x[-1]) -1)) if (x[-1].isdigit() and (int(x[-1]) > 1)) else x)

Input:

print(df)

  A_phase_1 A_phase_2 A_phase_3 A_phase_4 A_phase_5 B_phase_1 B_phase_2 B_phase_3 B_phase_4 B_phase_5 C_phase_1 C_phase_2 C_phase_3 C_phase_4 C_phase_5 X_phase_t
0        A1        A2        A3        A4        A5        B1        B2        B3        B4        B5        C1        C2        C3        C4        C5        Xt

Output:

print(df)

  A_phase_1 A_phase_2 A_phase_3 A_phase_4 B_phase_1 B_phase_2 B_phase_3 B_phase_4 C_phase_1 C_phase_2 C_phase_3 C_phase_4 X_phase_t
0        A1        A3        A4        A5        B1        B3        B4        B5        C1        C3        C4        C5        Xt
SeaBean
  • 22,547
  • 3
  • 13
  • 25
  • I need to use .filter more, the .drop line on your solution is very concise and readable! – Clay Shwery Jun 29 '21 at 21:25
  • @ClayShwery Yes, it is a very handy function to filter columns with various methods. It supports regex filtering too! Really powerful. – SeaBean Jun 29 '21 at 21:26
  • @seaBean, the second code (the one wit the lambda function returns the error- `valueError: invalid literal for int() with base 10: 't' ` – RayX500 Jun 29 '21 at 21:43
  • @SeaBean, I have added the screenshot of the error to the initial post. Thanks – RayX500 Jun 29 '21 at 21:49
  • @RickyTricky Your actual column names have some column(s) ends with 't' rather than a number ? – SeaBean Jun 29 '21 at 21:55
  • @SeaBean., Thanks for the observation. Works like magic! – RayX500 Jun 29 '21 at 22:01
  • @RickyTricky I have edited the solution to support non-numeric last character column name without error. It will simply retain those column names. – SeaBean Jun 29 '21 at 22:04
0
# 1.) delete phase 2 cols 
#    more precisely, keep only the other columns
df = df[[col for col in df.columns if col[-1]!='2']]
# This next could be done inline but keeping it separate here
rename_dict = {col:col[:-1]+str(int(col[-1])-1) for col in df.columns if col[-1] in ['3','4','5']}
df.rename(columns=rename_dict,inplace=True)
Clay Shwery
  • 380
  • 1
  • 8
0
from pandas import DataFrame
from itertools import product

def ren(col):
    prefix, sufix = col.rsplit('_', maxsplit=1)
    sufix = int(sufix)
    return f'{prefix}_{max(sufix-1, 1)}'

# some dummy df
df = DataFrame({f'{letter}_phase_{n}':[i, i, i] 
                for i, (letter, n) in 
                enumerate(product('ABC', range(1, 6)), start=1)})

df = df.drop(df.filter(like='phase_2').columns, axis=1)
# df = df.drop([f'{letter}_phase_2' for letter in 'ABC'], axis=1) # alternative
df = df.rename(mapper=ren, axis=1)
print(df)
buran
  • 13,682
  • 10
  • 36
  • 61
  • this only works for your dummy df, but not my df – RayX500 Jun 29 '21 at 21:57
  • @RickyTricky,fFrom the comments in the accepted answer and the error of screenshot you have added it became clear that information in your question is incorrect - there were column names ending with `t`. Next time don't waste the time of people trying to help you - [make good reproducible pandas examples](https://stackoverflow.com/q/20109391/4046632). – buran Jun 30 '21 at 04:08