0

I am using Pig to generate an output. I would like to randomly assign the output to 2 groups. As some of you are aware, Pig outputs files in the format part-m-00000 or part-r-00000 into a folder. I would like to iterate through all the files in an output folder and assign every row a 0 or 1 randomly.

I have the portion of code for the assignment:

     with open('part-r-00000','r') as csvinput:
with open('output2.csv', 'w') as csvoutput:
    writer = csv.writer(csvoutput, lineterminator='\n')
    reader = csv.reader(csvinput)

    all = []
    for row in reader:
        row.append(randint(0,1))
        all.append(row)

    for row in reader:
         all.append(row)

    writer.writerows(all)

This definitely works. I also have sample input and output:

    Sample input:
    0,1,2,1,4,3,3,4,1,1
    2,3,4,1,0,0,1,2,1,1
    0,2,3,1,0,2,3,1,1,1

    Sample output:
    0,1,2,1,4,3,3,4,1,1,0
    2,3,4,1,0,0,1,2,1,1,0
    0,2,3,1,0,2,3,1,1,1,1

However, I need to find out how many files are in the folder and add another loop to loop through each file. How do I do this?

Michal
  • 1,863
  • 7
  • 30
  • 50

4 Answers4

1

You can just iterate over all the files (os.listdir) in the current directory (os.getcwd):

import os
for filename in os.listdir(os.getcwd()):
    # do your stuff
enrico.bacis
  • 30,497
  • 10
  • 86
  • 115
1
import os
for f in os.listdir('/path/to/directory'):
    # do something with f
chrisaycock
  • 36,470
  • 14
  • 88
  • 125
0

You can use os.listdir() to list all the files in the current directory, or optionally include a path if you want to scan a separate directory. You can then iterate through the list of files:

import os

filelist = os.listdir()

for file in filelist:
    # do your stuff
MattDMo
  • 100,794
  • 21
  • 241
  • 231
0

If you want it to work with sub directories:

for subdir, dirs, files in os.walk(root):
    for file in files:
        # subdir+'/'+file would be the name of each file

edit: root would be the full path to the folder holding these files

Iterating through directories with Python

Community
  • 1
  • 1
rocktheartsm4l
  • 2,129
  • 23
  • 38