I agree with @EdChum that this would be simplest given a Series object named s
:
d = pd.DataFrame(s.values.reshape(1000000, -1))
which would reshape your Series into a DataFrame of shape (1,000,000, s.len / 1,000,000)
.
However, the above only works if you have a series whose length is an exact multiple of 1,000,000. Alternatively, you could do something like:
# note with python3, you need to use integer division // here
s.index = pd.MultiIndex.from_tuples([(x/1000000,x%1000000) for x in s.index])
# or an alternative below which does the same thing
#s.index = pd.MultiIndex.from_tuples(s.index.map(lambda x: (x/1000000, x%1000000)))
s.unstack(0)
which will give you several columns of the same length with the last column padded with NaN
s.
Here's an example with a Series of length 55 which I want split into columns of length 10. Note the last column has the last 5 values set to NaN
:
In [42]: s = pd.Series(np.arange(55))
In [43]: s
Out[43]:
0 0
1 1
2 2
...
53 53
54 54
dtype: int64
# with python3 x//10, x%10
In [44]: s.index = pd.MultiIndex.from_tuples(s.index.map(lambda x: (x/10, x%10)))
In [45]: s.unstack(0)
Out[45]:
0 1 2 3 4 5
0 0 10 20 30 40 50
1 1 11 21 31 41 51
2 2 12 22 32 42 52
3 3 13 23 33 43 53
4 4 14 24 34 44 54
5 5 15 25 35 45 NaN
6 6 16 26 36 46 NaN
7 7 17 27 37 47 NaN
8 8 18 28 38 48 NaN
9 9 19 29 39 49 NaN
Note two things:
that using s.index.map(lambda ...)
should be faster than the list comprehension for very large arrays.
if using python3, make sure to user integer division in the lambda function: lambda x: (x // N, x % N)
.