Say I have a Pandas dataframe with 4 rows and 5 columns. For simplicity I will convert it into a Numpy array, which looks like this:
import numpy as np
A = np.array([[23, 43, 23, 110, 5],
[83, 32, 12, 123, 4],
[58, 41, 59, 189, 1],
[93, 77, 22, 170, 3]])
For each row, I want to insert some similar rows right after the row, with only the column 4 decreasing by 1 each time until 0. The expected output should look like this:
np.array([[23, 43, 23, 110, 5],
[23, 43, 23, 110, 4],
[23, 43, 23, 110, 3],
[23, 43, 23, 110, 2],
[23, 43, 23, 110, 1],
[23, 43, 23, 110, 0],
[83, 32, 12, 123, 4],
[83, 32, 12, 123, 3],
[83, 32, 12, 123, 2],
[83, 32, 12, 123, 1],
[83, 32, 12, 123, 0],
[58, 41, 59, 189, 1],
[58, 41, 59, 189, 0],
[93, 77, 22, 170, 3],
[93, 77, 22, 170, 2],
[93, 77, 22, 170, 1],
[93, 77, 22, 170, 0]])
Below is the code that I have come up with:
new_rows = []
for i, row in enumerate(A):
new = A[i, 4] - 1
while new >= 0:
new_row = row.copy()
new_row[4] = new
new_rows.append(new_row)
new -= 1
new_A = np.vstack([A, np.array(new_rows)])
print(new_A)
Output
[[ 23 43 23 110 5]
[ 83 32 12 123 4]
[ 58 41 59 189 1]
[ 93 77 22 170 3]
[ 23 43 23 110 4]
[ 23 43 23 110 3]
[ 23 43 23 110 2]
[ 23 43 23 110 1]
[ 23 43 23 110 0]
[ 83 32 12 123 3]
[ 83 32 12 123 2]
[ 83 32 12 123 1]
[ 83 32 12 123 0]
[ 58 41 59 189 0]
[ 93 77 22 170 2]
[ 93 77 22 170 1]
[ 93 77 22 170 0]]
Obviously, the code is not efficient since it doesn't use any Numpy vectorization. In reality, I have more than 4,000 original rows so a speed-up is definitely needed. Moreover, I cannot insert new rows right after each row. Is there any efficient way to do this in Numpy or Pandas?