justify data from right to left

Question

I have a dataset of rows containing varying lengths of integer values in a series. I want to separate the series so each integer has its own column but align these values along the right-most column. I want the dataframe to resenble upper triangle of a matrix.

Currently I have a dataset like:

    variable    value
0   0   [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
1   1   [1, 2, 3, 4, 5, 6, 7, 8, 9]
2   2   [1, 2, 3, 4, 5, 6, 7, 8]
3   3   [1, 2, 3, 4, 5, 6, 7]
4   4   [1, 2, 3, 4, 5, 6]
5   5   [1, 2, 3, 4, 5]
6   6   [1, 2, 3, 4]
7   7   [1, 2, 3]
8   8   [1, 2]
9   9   [1]

I apply this function

df = pd.DataFrame([pd.Series(x) for x in df2.value])
df.columns = ['{}'.format(x+1) for x in df.columns]

and I get this:

    1   2   3   4   5   6   7   8   9   10
0   1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 0.0
1   1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 NaN
2   1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 NaN NaN
3   1.0 2.0 3.0 4.0 5.0 6.0 7.0 NaN NaN NaN
4   1.0 2.0 3.0 4.0 5.0 6.0 NaN NaN NaN NaN
5   1.0 2.0 3.0 4.0 5.0 NaN NaN NaN NaN NaN
6   1.0 2.0 3.0 4.0 NaN NaN NaN NaN NaN NaN
7   1.0 2.0 3.0 NaN NaN NaN NaN NaN NaN NaN
8   1.0 2.0 NaN NaN NaN NaN NaN NaN NaN NaN
9   1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN

But what i want is this:

    1   2   3   4   5   6   7   8   9   10
0   1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 0.0
1   NaN 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 
2   NaN NaN 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 
3   NaN NaN NaN 1.0 2.0 3.0 4.0 5.0 6.0 7.0 
4   NaN NaN NaN NaN 1.0 2.0 3.0 4.0 5.0 6.0 
5   NaN NaN NaN NaN NaN 1.0 2.0 3.0 4.0 5.0 
6   NaN NaN NaN NaN NaN NaN 1.0 2.0 3.0 4.0
7   NaN NaN NaN NaN NaN NaN NaN 1.0 2.0 3.0 
8   NaN NaN NaN NaN NaN NaN NaN NaN 1.0 2.0 
9   NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0

Here's very good answer by divakar in numpy [`justify numpy array`](https://stackoverflow.com/a/44559180/12416453) — Ch3steR, Jul 30 '20 at 14:32

Shubham Sharma · Accepted Answer · 2020-07-30T17:06:56.537

4

One possible approach is to use Series.str.len to calculate the max length of the list in the column value i.e lmax then using list comprehension pad each of the list based on lmax:

lmax = df['value'].str.len().max()
df1 = pd.DataFrame([[np.nan] * (lmax - len(s)) + s
                    for s in df['value']], columns=range(1, lmax + 1))

Result:

print(df1)
     1    2    3    4    5    6    7    8    9  10
0  1.0  2.0  3.0  4.0  5.0  6.0  7.0  8.0  9.0   0
1  NaN  1.0  2.0  3.0  4.0  5.0  6.0  7.0  8.0   9
2  NaN  NaN  1.0  2.0  3.0  4.0  5.0  6.0  7.0   8
3  NaN  NaN  NaN  1.0  2.0  3.0  4.0  5.0  6.0   7
4  NaN  NaN  NaN  NaN  1.0  2.0  3.0  4.0  5.0   6
5  NaN  NaN  NaN  NaN  NaN  1.0  2.0  3.0  4.0   5
6  NaN  NaN  NaN  NaN  NaN  NaN  1.0  2.0  3.0   4
7  NaN  NaN  NaN  NaN  NaN  NaN  NaN  1.0  2.0   3
8  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  1.0   2
9  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   1

edited Jul 30 '20 at 17:06

answered Jul 30 '20 at 13:45

Shubham Sharma

68,127
6
24
53

1

Nice answer, Check this [answer too](https://stackoverflow.com/a/44559180/12416453)+1 – Ch3steR Jul 30 '20 at 14:33
Hey thank you for your solution, I am getting an error when I try to implement it saying "operands could not be broadcast together with shapes (0,) (10,)" I am unsure what this means and how to fix this. – Tessd Jul 30 '20 at 15:34
1

Because my arrays were numpy arrays, we modified the code such as: `lmax = df['value'].str.len().max() df1 = pd.DataFrame([np.hstack([[np.nan] * (lmax - len(s)), s]) for s in df['value']], columns=range(1, lmax + 1))` – Tessd Jul 30 '20 at 16:56

score 1 · Answer 2 · answered Jul 30 '20 at 14:42

You could also use np.pad, but need to cast the dtype of each array into float first in order to fill with nan:

s = pd.DataFrame([np.pad(np.array(a).astype(float), (10 - len(a), 0), mode="constant",
                         constant_values=np.NaN) for a in df["value"]])
print (s)

     0    1    2    3    4    5    6    7    8     9
0  1.0  2.0  3.0  4.0  5.0  6.0  7.0  8.0  9.0  10.0
1  NaN  1.0  2.0  3.0  4.0  5.0  6.0  7.0  8.0   9.0
2  NaN  NaN  1.0  2.0  3.0  4.0  5.0  6.0  7.0   8.0
3  NaN  NaN  NaN  1.0  2.0  3.0  4.0  5.0  6.0   7.0
4  NaN  NaN  NaN  NaN  1.0  2.0  3.0  4.0  5.0   6.0
5  NaN  NaN  NaN  NaN  NaN  1.0  2.0  3.0  4.0   5.0
6  NaN  NaN  NaN  NaN  NaN  NaN  1.0  2.0  3.0   4.0
7  NaN  NaN  NaN  NaN  NaN  NaN  NaN  1.0  2.0   3.0
8  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  1.0   2.0
9  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   1.0

I also thought of using `np.pad`, but i guess using `np.pad` adds a extra overhead here ;). What's your opinion? — Shubham Sharma, Jul 30 '20 at 14:53
Yeah it was a lot of work to get the expected output using `np.pad`. I think yours is more straight forward :) — Henry Yik, Jul 30 '20 at 15:00
yeah, that 's what i though so when i was experimenting with `np.pad` here ;). BTW good answer +1. — Shubham Sharma, Jul 30 '20 at 15:16

justify data from right to left

2 Answers2