I am dealing with a large text file containing the decimal places of pi that has this format. Note that the header is all numbers and does not have a string.
Header format: Number_of_sequences Total_Pi_Digits File_Version_Number
550 10000 5
*Pi Sequence Part 1
1415926535897932384
*Pi Sequence Part 2
6264338327950288419
*Pi Sequence Part 3
1693993751058209749
I need to make a sliding window that crops the file using three arguments (window_size, step_size, and last_windowstart). last_windowstart is where the last window starts.
The number of files is determined by dividing the Total_Pi_Digits by the window.
If the file had 99 Total_Pi_Digits, window_size of 10, and a step_size of zero, there would be a total of 11 windows since 99//10=10 and 99%10 leaves 9 in window 11.
lastwindow_start should be 90 I guess for this example. I am not sure that I need last_window start.
For each a window, a file will be created with the name PiSubsection# where # is the window number.
For each file, every window should have the same new header where Number_of_sequences Total_Pi_Digits File_Version_Number is the header format.
Number_of_sequences Total_Pi_Digits will change based upon window_size and step_size but File_Version_Number must not change.
My problem is that my sliding window algorithm does not account for a step_size of 0 and it does not produce the right amount of files. It produces twice as many files so far and I am not sure why.
Additionally, I am not sure that even I understand the math for the amount of windows in a sliding window algorithm.
How do I fix my sliding window algorithm to accept a step_size of 0 and produce the right amount of output files?
inputFileName = example.txt
import shlex
def sliding_window(windows_size, step_size, lastwindow_start):
for i in xrange(0, lastwindow_start, step_size):
yield (i, i + windows_size)
def PiCrop(windows_size, step_size):
with open(inputFileName, 'r') as input:
first_line = shlex.split(input.readline().strip())
PiNumber = int(first_line[1])
lastwindow_start = PiNumber-(PiNumber%windows_size)
flags = [False for i in range(lastwindow_start)]
first_line[1] = str(windows_size * int(first_line[0]))
first_line = " ".join(first_line)
for line in input:
if line.startswith(first_line[0]):
pass
elif line.startswith('*'):
Indiv = line
else:
for counter, window in enumerate(sliding_window(windows_size,step_size,lastwindow_start)):
newline = line[window[0]:window[1]]
with open('PiSection{}.txt'.format(counter), 'a') as output:
if (flags[counter] == False):
flags[counter] = True
output.write(first_line + '\n')
output.write(Indiv)
output.write(newline + '\n')