You can use pivot_table
. Though you'll need to define the index
as the cumcount
of the grouped count
column, pivot_table
can't figure it out all on its own :)
(df.pivot_table(index=df.groupby('count').cumcount(),
columns='count',
values='num'))
count 1 2 3
0 1.0 3.0 7.0
1 2.0 5.0 NaN
2 4.0 6.0 NaN
You also have the parameter fill_value
, though I wouldn't recommend you to use it, since you'll get mixed types. Now it looks like NumPy
would be a good option from here, you can easily obtain an array from the result with new_df.to_numpy()
.
Also, focusing on the logic in ones
, we can vectorise this with (based on this answer):
m = df.num.to_numpy().itemsize
df['count'] = (df.num.to_numpy()[:,None] & (1 << np.arange(m)) > 0).view('i1').sum(1)
Here's a check on both approaches' performance:
df_large = pd.DataFrame({'num':np.random.randint(0,10,(10_000))})
def vect(df):
m = df.num.to_numpy().itemsize
(df.num.to_numpy()[:,None] & (1 << np.arange(m)) > 0).view('i1').sum(1)
%timeit vect(df_large)
# 340 µs ± 5.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df_large.apply(lambda row : ones(row['num']), axis = 1)
# 103 ms ± 2.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)