0

I've loaded data from a tab deliminated file into a DF. The Tab data is a form filled out with a template.

A critical concept is that a variable number of rows makes up one entry in the form. In DF1 below, every time the index is "A", a new record is starting. So the code will need to iterate through the rows to rebuild each record in DF2. Each record will be represented as one row in DF2.

Based on the fact that each "A" row in DF1 starts a new form entry (and corresponding row in DF2), we can see in DF1 below there are just two entries in my example, and will be just two rows in DF2. Also imortant: there are a different number of pieces of data (columns) in each row. Z has 2 (then NAs), A has 3, B has 4.

All of this needs to be mapped to DF2 depending on the index letters Z, A, B (note there are more index letters but this is simplified for this example).

DF 1

-  A    B  C   D
Z xyz  5   NA  NA
A COA  aa  bb  NA
B RE   01  02  03
B DE   04  05  06
A COB  dd  ee  NA
B RE   01  02  03
B DE   04  05  06

In the past i've done this type of thing in VBA and would have used a CASE statement to transform the data. I've found a good start using dictionaries in this thread:

Replacements for switch statement in Python?

One code example at the above thread suggests using a dictionary type case statement:

return{
    'a': 1,
    'b': 2,
}[x]

This seems like it would work although i'm not certain how to execute in practice. In addition for each A, B, etc above, I need to output multiple instructions, depending on the index letter. For the most part, the instructions are where to map in DF2. For example, in my:

Index A:
Map column A to DF2.iloc[1]['B']
Map column B to DF2.iloc[1]['C']
Map column C to DF2.iloc[1]['D']

Index B:
Would have four instructions, similar to above.

DF2 would end up looking like so

-    A    B     C    D    E   F   G    H    I    J    K    L
1   xyz  COA   aa   bb   RE   01  02  03    DE   04   05   06
2   xyz  COB   dd   ee   RE   01  02  03    DE   04   05   06

So for each row in DF1, a different number of instructions is being performed depending on the "index letter." All instructions are telling the code where to put the data in DF2. The mapping instruction for each different index letter will always be the same for the columns, only the row will be changing (some type of counter as you move from one record group to the next in DF2).

How can I handle the different number of instructions for each type of index letter in a switch/case type format?

Thank you

thesimplevoodoo
  • 153
  • 3
  • 11

1 Answers1

1

I think you can use:

#filter only 2,3 index rows
df1 = df[df.index.isin([2,3])].copy()
#create new column for same value if 2 in index
df1['new'] = np.where(df1.index == 2, 'Z', df1.A)
#create groups by compare 2
df1['g'] = (df1.index == 2).cumsum()
#convert columns to index and reshape, then change order
df1 = (df1.set_index(['g','new']).unstack()
          .swaplevel(0,1, axis=1)
          .sort_index(axis=1, ascending=[False, True]))
#default columns names
df1.columns = range(len(df1.columns))
print (df1)
     0   1   2   3  4   5   6    7  8    9  10  11
g                                                 
1  ABC  aa  bb  cc  R  01  02  NaN  D  NaN  03  04
2  DEF  dd  ee  ff  R  01  02  NaN  D  NaN  03  04
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Apologies, I've done a poor job of asking this question by trying to simplify the content of the DFs. I'm going to try and edit the question and if that doesn't work I will probably delete it. Sorry to waste your time on the first iteration – thesimplevoodoo Apr 17 '20 at 14:37