I'm trying to implement a sliding/moving window approach on lines of a csv file using Python. Each line can have a column with a binary value yes
or no
. Basically, I want to rare yes
noises. That means if say we have 3 yes
lines in a window of 5 (max of 5), keep them. But if there is 1 or 2, let's change them to no
. How can I do that?
For instance, the following yes
should both become no
.
...
1,a1,b1,no,0.75
2,a2,b2,no,0.45
3,a3,b3,yes,0.98
4,a4,b4,yes,0.22
5,a5,b5,no,0.46
6,a6,b6,no,0.20
...
But in the followings, we keep as is (there can be a window of 5 where 3 of them are yes
):
...
1,a1,b1,no,0.75
2,a2,b2,no,0.45
3,a3,b3,yes,0.98
4,a4,b4,yes,0.22
5,a5,b5,no,0.46
6,a6,b6,yes,0.20
...
I attempted writing something, having a window of 5, but got stuck (it is not complete):
window_size = 5
filename='C:\\Users\\username\\v3\\And-'+v3file.split("\\")[5]
with open(filename) as fin:
with open('C:\\Users\\username\\v4\\And2-'+v3file.split("\\")[5],'w') as finalout:
line= fin.readline()
index = 0
sequence= []
accs=[]
while line:
print(line)
for i in range(window_size):
line = fin.readline()
sequence.append(line)
index = index + 1
fin.seek(index)