-1

I have a dataframe as follows

Cycle A_0 A_1 A_2 A_3 B_0 B_1 B_2 B_3
1 3 4 5 6 1 4 5
1 8 5 3 1 0 8 6 4
2 7 9 1 6 1 0 2 3
3 5 9 1 0 3 8 3

this dataframe has to combined to two column A and B

Expected output

Cycle A B
1 3
1 4 1
1 5 4
1 6 5
1 8 0
1 5 8
1 3 6
1 1 4
2 7 1
2 9 0
2 1 2
2 6 3
3 5 0
3 3
3 9 8
3 1 3

What i did?

A = [f"A_{i}" for i in range(20)]
B = [f"B_{i}" for i in range(20)]

df2['A'] = df[A].bfill(axis=1).iloc[:, 0]
df2['B'] = df[B].bfill(axis=1).iloc[:, 0]

This line of code is givng me an output datframe by avoiding the nan. How can i get the expected output?

ADDON

added a new colum to the initial data and expected outcome

mathew
  • 59
  • 6
  • [Combine Columns in Pandas - Stack Overflow](https://stackoverflow.com/questions/72233876/combine-columns-in-pandas/72233966) – Ynjxsjmh May 14 '22 at 06:06

2 Answers2

1

code part

columns = pd.Index(['A_0', 'A_1', 'A_2', 'A_3', 'B_0', 'B_1', 'B_2', 'B_3'], dtype='string')
values = np.array([[ 3.,  4.,  5.,  6., np.nan,  1.,  4.,  5.],
                 [ 8.,  5.,  3.,  1.,  0.,  8.,  6.,  4.],
                 [ 7.,  9.,  1.,  6.,  1.,  0.,  2.,  3.],
                 [ 5., np.nan,  9.,  1.,  0.,  3.,  8.,  3.]],
                dtype=float)
## Or retrive from raw DataFrame if already exists
# columns = df_raw.columns
# values = df_raw.values

## Construct MultiIndex
mi = pd.MultiIndex.from_tuples((s.split("_") for s in columns))

## Construct DataFrame
df = pd.DataFrame(values, columns=mi)

## reshape: stack level=1 (2nd row) of columns to index
df_result = df.stack(level=1)

>>> df_result
       A    B
0 0  3.0  NaN
  1  4.0  1.0
  2  5.0  4.0
  3  6.0  5.0
1 0  8.0  0.0
  1  5.0  8.0
  2  3.0  6.0
  3  1.0  4.0
2 0  7.0  1.0
  1  9.0  0.0
  2  1.0  2.0
  3  6.0  3.0
3 0  5.0  0.0
  1  NaN  3.0
  2  9.0  8.0
  3  1.0  3.0

Explain

Steps:

  1. Construct MultiIndex from flat Index

    Pandas provides 4 builtin method to construct MultiIndex; Here use from_tuples form doc: https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.from_tuples.html

    • from_arrays :: input [[x1, x2, ...], [y1, y2, ...]] output [[x1, y1], [x2, y2], ...]

    • from_tuples :: input [[x1, y1], [x2, y2], ...] output same

    • from_frame :: Transfer DataFrames.values to MultiIndex

    • from_product :: input like arrays, but zip them to output. e.g. input [[x1, x2], [y1, y2, y3]] output

    MultiIndex([('x1', 'y1'), ('x1', 'y2'), ('x1', 'y3'), ('x2', 'y1'), ('x2', 'y2'), ('x2', 'y3')], )

  2. Construct new DataFrame and reshape by stack

    See User Guide on reshape/pivot topic: doc: https://pandas.pydata.org/docs/user_guide/reshaping.html

Allen Paul
  • 77
  • 3
1

You can use pandas.wide_to_long:

(pd.wide_to_long(df.reset_index(), stubnames=['A', 'B'], i=['index','Cycle'], j='x', sep='_')
   .droplevel(['index', 'x'])
 )

Output:

         A    B
Cycle          
1      3.0  NaN
1      4.0  1.0
1      5.0  4.0
1      6.0  5.0
1      8.0  0.0
1      5.0  8.0
1      3.0  6.0
1      1.0  4.0
2      7.0  1.0
2      9.0  0.0
2      1.0  2.0
2      6.0  3.0
3      5.0  0.0
3      NaN  3.0
3      9.0  8.0
3      1.0  3.0
mozway
  • 194,879
  • 13
  • 39
  • 75
  • got this error: *KeyError: "['index'] not in index"* – mathew May 14 '22 at 11:49
  • What is the full error? Does your index have a name in the real dataset? Then you need to adapt to use this name instead of "index" – mozway May 14 '22 at 11:55