reshaping the dataframe values to a single base in pandas/numpy

Question

I have a dataframe as below.

+----------+-------+-----+-------+-------+-----+
|   Date   |   A   |  B  |   C   |   D   |  E  |
+----------+-------+-----+-------+-------+-----+
| 1-Jan-21 | 1,089 |     |       |  195  |     |
+----------+-------+-----+-------+-------+-----+
| 2-Jan-21 |  609  | 547 |  491  | 1,091 |     |
+----------+-------+-----+-------+-------+-----+
| 3-Jan-21 | 1,440 | 824 |  184  |  512  | 665 |
+----------+-------+-----+-------+-------+-----+
| 4-Jan-21 |  704  | 632 |  345  |       | 969 |
+----------+-------+-----+-------+-------+-----+
| 5-Jan-21 |  297  |     | 1,426 |       | 555 |
+----------+-------+-----+-------+-------+-----+

df = pd.DataFrame({
    'Date' : ['1-Jan-21','2-Jan-21','3-Jan-21','4-Jan-21','5-Jan-21'],
    'A': [1089, 609, 1440, 704, 297],
    'B': [np.nan, 547, 824, 632, np.nan],
    'C': [np.nan, 491, 184, 345, 1426],
    'D': [195, 1091, 512, np.nan, np.nan],
    'E': [np.nan, np.nan, 665, 969, 555]
})

I want to ignore the Nan values and bring down all the values to a single base/floor and rename the Date values as t, t-1,t-2 and so on.

my desired output is as below.

+------+-------+-----+-------+-------+-----+
| time |   A   |  B  |   C   |   D   |  E  |
+------+-------+-----+-------+-------+-----+
|  t-4 | 1,089 |     |       |       |     |
+------+-------+-----+-------+-------+-----+
|  t-3 |  609  |     |  491  |       |     |
+------+-------+-----+-------+-------+-----+
|  t-2 | 1,440 | 547 |  184  |  195  | 665 |
+------+-------+-----+-------+-------+-----+
|  t-1 |  704  | 824 |  345  | 1,091 | 969 |
+------+-------+-----+-------+-------+-----+
|   t  |  297  | 632 | 1,426 |  512  | 555 |
+------+-------+-----+-------+-------+-----+

how should I approach to solve this in pandas or numpy?

jezrael · Accepted Answer · 2021-02-12T07:58:01.337

You can sorting values by sorted per columns with keys parameter and then change index by list comprehension:

df = df.set_index('Date').apply(lambda x: pd.Series(sorted(x, key=pd.notna)))
df.index = [f't-{x}' if x!=0 else 't' for x in range(len(df)-1, -1, -1)]

print (df)
        A      B       C       D      E
t-4  1089    NaN     NaN     NaN    NaN
t-3   609    NaN   491.0     NaN    NaN
t-2  1440  547.0   184.0   195.0  665.0
t-1   704  824.0   345.0  1091.0  969.0
t     297  632.0  1426.0   512.0  555.0

In numpy is possible use very nice @Divakar function:

df = df.drop('Date', axis=1)
#https://stackoverflow.com/a/44559180/2901002
df = pd.DataFrame(justify(df.to_numpy(), invalid_val=np.nan, axis=0, side='down'), 
                  columns=df.columns)

df.index = [f't-{x}' if x!=0 else 't' for x in range(len(df)-1, -1, -1)]

print (df)
          A      B       C       D      E
t-4  1089.0    NaN     NaN     NaN    NaN
t-3   609.0    NaN   491.0     NaN    NaN
t-2  1440.0  547.0   184.0   195.0  665.0
t-1   704.0  824.0   345.0  1091.0  969.0
t     297.0  632.0  1426.0   512.0  555.0

If need first column Time use DataFrame.insert:

L = [f't-{x}' if x!=0 else 't' for x in range(len(df)-1, -1, -1)]

df.insert(0, 'Time', L)
print (df)
  Time       A      B       C       D      E
0  t-4  1089.0    NaN     NaN     NaN    NaN
1  t-3   609.0    NaN   491.0     NaN    NaN
2  t-2  1440.0  547.0   184.0   195.0  665.0
3  t-1   704.0  824.0   345.0  1091.0  969.0
4    t   297.0  632.0  1426.0   512.0  555.0

This works like a charm, Accepted and upvoted. But, I'm unable to comprehend the `sorted` method in the `pandas` answer. even though we're sorting, the final result is not sorted. understood correct? — Tommy, Feb 12 '21 at 08:00
@Tommy - It should be sorted by `key` parameter, which is testing not missing values, so it should working correct, more info [here](https://docs.python.org/3/howto/sorting.html#key-functions) — jezrael, Feb 12 '21 at 08:03
Hi, what is the advantage of having `pd.Series()` method in the first line? answer is same without it as well. — Tommy, Feb 19 '21 at 15:05
@Tommy Be free omit it, I think necesarry only for oldier pandas versions. — jezrael, Feb 19 '21 at 17:54

score 2 · Answer 2 · answered Feb 12 '21 at 08:09

Use simple approach

per series dropna()
concat to start to make series consistent length

df.Date = pd.to_datetime(df.Date)
df = df.assign(**{c:np.concatenate([np.full(df[c].isna().sum(), np.nan) , df[c].dropna()])
             for c in df.select_dtypes(include=['int64',"float64"]).columns})

	Date	A	B	C	D	E
0	2021-01-01 00:00:00	1089	nan	nan	nan	nan
1	2021-01-02 00:00:00	609	nan	491	nan	nan
2	2021-01-03 00:00:00	1440	547	184	195	665
3	2021-01-04 00:00:00	704	824	345	1091	969
4	2021-01-05 00:00:00	297	632	1426	512	555

reshaping the dataframe values to a single base in pandas/numpy

2 Answers2