0

I have the following table structure.

0 1 2 3 4 5 6 7 8 9
Baseline Baseline Baseline V1 V1 V1 V2 V2 V2
UID test1 test2 test3 test1 test2 test3 test1 test2 test3
xyzarf date1 date2 date3 date4 date5 date2 date6 date7 date8
abcwer date9 date10 date11 date1 date3 date5 date6 date9 date8
defdsg date1 date2 date3 date4 date5 date2 date6 date7 date8

I want to transform like this

UID    visit    type_of_test  test_date
xyzarf baseline    test1       date1
xyzarf baseline    test2       date2
xyzarf baseline    test3       date3
xyzarf V1          test1       date4
xyzarf V1          test2       date5
xyzarf V1          test3       date2
xyzarf V2          test1       date6
xyzarf V2          test2       date7
xyzarf V2          test3       date8
abcwer baseline    test1       date9
abcwer baseline    test2       date10
abcwer baseline    test3       date11
abcwer V1          test1       date1
abcwer V1          test2       date3
abcwer V1          test3       date5
abcwer V2          test1       date6
abcwer V2          test2       date9
abcwer V2          test3       date8
defdsg baseline    test1       date1
defdsg baseline    test2       date2
defdsg baseline    test3       date3
defdsg V1          test1       date4
defdsg V1          test2       date5
defdsg V1          test3       date2
defdsg V2          test1       date6
defdsg V2          test2       date7
defdsg V2          test3       date8

I have tried many combinations of pandas pivot table and groupby but I couldn't make it work. Please kindly help.

Edit: Please find the sample converted as dictionary

        [{'0': nan,
  '1': 'Baseline',
  '2': 'Baseline',
  '3': 'Baseline',
  '4': 'V1',
  '5': 'V1',
  '6': 'V1',
  '7': 'V2',
  '8': 'V2',
  '9': 'V2'},
 {'0': 'UID',
  '1': 'test1',
  '2': 'test2',
  '3': 'test3',
  '4': 'test1',
  '5': 'test2',
  '6': 'test3',
  '7': 'test1',
  '8': 'test2',
  '9': 'test3'},
 {'0': 'xyzarf',
  '1': 'date1',
  '2': 'date2',
  '3': 'date3',
  '4': 'date4',
  '5': 'date5',
  '6': 'date2',
  '7': 'date6',
  '8': 'date7',
  '9': 'date8'},
 {'0': 'abcwer',
  '1': 'date9',
  '2': 'date10',
  '3': 'date11',
  '4': 'date1',
  '5': 'date3',
  '6': 'date5',
  '7': 'date6',
  '8': 'date9',
  '9': 'date8'},
 {'0': 'defdsg',
  '1': 'date1',
  '2': 'date2',
  '3': 'date3',
  '4': 'date4',
  '5': 'date5',
  '6': 'date2',
  '7': 'date6',
  '8': 'date7',
  '9': 'date8'}]

2 Answers2

1

You can combine both header rows into a header first then uses pandas.melt method to convert your table into a long format; lastly, split the header column into 2 columns.

## clean null values
df = df.fillna("")

## combine 2 rows to 1 as a header
df.columns = (df.iloc[0] + " " + df.iloc[1]).str.strip()
df.columns.name = ""
df = df.drop([df.index[0],df.index[1]]) ## drop both rows
df = df.reset_index(drop=True) ## reset index

## unpivot data frame
df = pd.melt(df, id_vars=["UID"], var_name = "mash", value_name="test_date")

## expand 1 column into 2
df[["visit","type_of_test"]] = df["mash"].str.split(expand=True)

## rearrange and reorder data frame
df = df[["UID", "visit", "type_of_test", "test_date"]].sort_values(
    by = ["UID","visit", "type_of_test"]).reset_index(drop=True)

## Output []:
       UID     visit type_of_test test_date
0   abcwer  Baseline        test1     date9
1   abcwer  Baseline        test2    date10
2   abcwer  Baseline        test3    date11
3   abcwer        V1        test1     date1
4   abcwer        V1        test2     date3
5   abcwer        V1        test3     date5
6   abcwer        V2        test1     date6
7   abcwer        V2        test2     date9
8   abcwer        V2        test3     date8
9   defdsg  Baseline        test1     date1
10  defdsg  Baseline        test2     date2
11  defdsg  Baseline        test3     date3
12  defdsg        V1        test1     date4
13  defdsg        V1        test2     date5
14  defdsg        V1        test3     date2
15  defdsg        V2        test1     date6
16  defdsg        V2        test2     date7
17  defdsg        V2        test3     date8
18  xyzarf  Baseline        test1     date1
19  xyzarf  Baseline        test2     date2
20  xyzarf  Baseline        test3     date3
21  xyzarf        V1        test1     date4
22  xyzarf        V1        test2     date5
23  xyzarf        V1        test3     date2
24  xyzarf        V2        test1     date6
25  xyzarf        V2        test2     date7
26  xyzarf        V2        test3     date8

I believe there is a better way to work on this. Thus, let's see what others got.

JK Chai
  • 121
  • 6
1

A simple solution with Multiindex and stack:

new_df = df.iloc[2:, 1:]   # getting date* content
new_df.columns = pd.MultiIndex.from_tuples(zip(df.iloc[0, 1:], df.iloc[1, 1:]))
new_df.index = df.iloc[2:, 0]   # setting UID as index
new_df = new_df.stack(level=[0, 1]).reset_index()
new_df.columns = ["UID", "visit", "type_of_test", "test_date"]
 
print(new_df)

Output:

       UID     visit type_of_test test_date
0   xyzarf  Baseline        test1     date1
1   xyzarf  Baseline        test2     date2
2   xyzarf  Baseline        test3     date3
3   xyzarf        V1        test1     date4
4   xyzarf        V1        test2     date5
5   xyzarf        V1        test3     date2
6   xyzarf        V2        test1     date6
7   xyzarf        V2        test2     date7
8   xyzarf        V2        test3     date8
9   abcwer  Baseline        test1     date9
10  abcwer  Baseline        test2    date10
11  abcwer  Baseline        test3    date11
12  abcwer        V1        test1     date1
13  abcwer        V1        test2     date3
14  abcwer        V1        test3     date5
15  abcwer        V2        test1     date6
16  abcwer        V2        test2     date9
17  abcwer        V2        test3     date8
18  defdsg  Baseline        test1     date1
19  defdsg  Baseline        test2     date2
20  defdsg  Baseline        test3     date3
21  defdsg        V1        test1     date4
22  defdsg        V1        test2     date5
23  defdsg        V1        test3     date2
24  defdsg        V2        test1     date6
25  defdsg        V2        test2     date7
26  defdsg        V2        test3     date8
Tranbi
  • 11,407
  • 6
  • 16
  • 33