The method you mentioned will indeed yield quite slow results for large sets of data, due to the way that append() methods interact with memory. Essentially you are rewriting the same part of memory ~360,000 times, extending it with a single entry. You can speed this up significantly by converting to numpy arrays and using a single operation to search for the edges. I wrote a minimal example to demonstrate with a random set of binary data.
binaries = np.random.randint(0,2,200000)
Binary = pd.DataFrame(binaries)
t1 = time.time()
startedge, endedge = pd.DataFrame([]), pd.DataFrame([])
for i in range(0, len(Binary) - 1):
if Binary[0][i] == 0 and Binary[0][i+1] == 1:
startedge.append([i])
elif Binary[0][i] == 1 and Binary[0][i+1] == 0:
endedge.append([i])
t2 = time.time()
print(f"Looping through took {t2-t1} seconds")
# Numpy based method, including conversion of the dataframe
t1 = time.time()
binary_array = np.array(Binary[0])
startedges = search_sequence_numpy(binary_array, np.array([0,1]))
stopedges = search_sequence_numpy(binary_array, np.array([1,0]))
t2 = time.time()
print(f"Converting to a numpy array and looping through required {t2-t1} seconds")
Output:
Looping through took 56.22933220863342 seconds
Converting to a numpy array and looping through required 0.029932022094726562 seconds
For the sequence search function I used the code from this answer Searching a sequence in a NumPy array
def search_sequence_numpy(arr,seq):
""" Find sequence in an array using NumPy only.
Parameters
----------
arr : input 1D array
seq : input 1D array
Output
------
Output : 1D Array of indices in the input array that satisfy the
matching of input sequence in the input array.
In case of no match, an empty list is returned.
"""
# Store sizes of input array and sequence
Na, Nseq = arr.size, seq.size
# Range of sequence
r_seq = np.arange(Nseq)
# Create a 2D array of sliding indices across the entire length of input array.
# Match up with the input sequence & get the matching starting indices.
M = (arr[np.arange(Na-Nseq+1)[:,None] + r_seq] == seq).all(1)
# Get the range of those indices as final output
if M.any() >0:
return np.where(np.convolve(M,np.ones((Nseq),dtype=int))>0)[0]
else:
return [] # No match found