I have a dataset with a very larger number of columns formatted like a.b.c.d.e.
My goal is to do two things. I would like to change the column name to 'c' I would also like to generate a dictionary for later use where 'c' maps to 'b.c' so I can change the names back at a later point. The full function I am using is below.
def trim_col_names(df):
cols = []
string_matches = {}
for col in df.columns[3:]:
tokens = col.split('.')
trimmed = tokens[2]
cols.append(trimmed)
colname = '.'.join(tokens[1:3])
string_matches[trimmed] = colname
df.columns = list(df.columns)[:3] + cols
df_p = trim_col_names(df_p)
Tokens prints as expected ['a', 'b', 'c', 'd', 'e']
however I am getting the following error. trimmed = tokens[2] IndexError: list index out of range
Interestingly when I switched the order or the lines trimmed = tokens[2]
and colname = '.'.join(tokens[1:3])
so colname was executed first, the error still appeared on trimmed which makes me think the problem is isolated to this one line. I also use very similar lines in other functions within this code with no issue. What am I missing?
Here is a sample dataset. It is thousands of columns so I have only given a very small subset of data. If this is not sufficient I can provide a larger dataset.
X Y Z tpm.293SLAM_rinderpest_infection_00hr.CH123.bhg.gh tpm.293SLAM_rinderpest_infection_01hr.CH124.byl.gw tpm.293SLAM_rinderpest_infection_02hr.CH125.lmg.ge
x y z 2 2 4
x1 y1 z1 3 8 2
x2 y2 z2 4 5 7
I am trying to keep CH123
as the column name and 293SLAM_rinderpest_infection_00hr.CH123
as the value it maps to in the dictioary.