I am using Pig to generate an output. I would like to randomly assign the output to 2 groups. As some of you are aware, Pig outputs files in the format part-m-00000 or part-r-00000 into a folder. I would like to iterate through all the files in an output folder and assign every row a 0 or 1 randomly.
I have the portion of code for the assignment:
with open('part-r-00000','r') as csvinput:
with open('output2.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)
all = []
for row in reader:
row.append(randint(0,1))
all.append(row)
for row in reader:
all.append(row)
writer.writerows(all)
This definitely works. I also have sample input and output:
Sample input:
0,1,2,1,4,3,3,4,1,1
2,3,4,1,0,0,1,2,1,1
0,2,3,1,0,2,3,1,1,1
Sample output:
0,1,2,1,4,3,3,4,1,1,0
2,3,4,1,0,0,1,2,1,1,0
0,2,3,1,0,2,3,1,1,1,1
However, I need to find out how many files are in the folder and add another loop to loop through each file. How do I do this?