Here is the data I have in a Pandas DataFrame:
ID | Min | Max
--------------
1 | 1 | 10
2 | 54 | 105
3 | 24 | 0
. | . | .
. | . | .
. | . | .
N | X | Y
Here is the output DataFrame I'm trying to get:
ID | Min | Max | All Numbers in Range
---------------------------------------
1 | 1 | 10 | [1,2,3,4,5,6,7,8,9,10]
2 | 54 | 105 | [54,55,56,...,104,105]
3 | 24 | 0 | [1,2,3,...,22,23,24]
. | . | . | .
. | . | . | .
. | . | . | .
N | X | Y | [X, ...............,Y]
I can do this with a loop and generate the lists (or Numpy arrays) row by row, but its very slow and it will take two hours to complete with the amount of data I have. I can also do this with Apply, but its no faster than the loop. And I can't seem to figure out how to vectorize this operation so it happens faster.
Here is one of the ways I've tried to vectorize it that didn't work:
def create_list(min, max):
if max != 0:
num_list= np.arange(min, max + 1, 1)
else:
num_list= np.arange(1, min + 1, 1)
return num_list
df["num_list"] = create_list(df["min"], df["max])
Which gives me the error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Any help would be appreciated?
Edit: My current solution before posting (no faster than a loop using iterrows):
def create_list(min, max):
if max != 0:
num_list= np.arange(min, max + 1, 1)
else:
num_list= np.arange(1, min + 1, 1)
return num_list
df["num_list"] = df.apply(lambda row: create_list(row["min"], row["max"]), axis = 1)