1

I need to be able to import and manipulate multiple text files in the function parameter. I figured using *args in the function parameter would work, but I get an error about tuples and strings.

def open_file(*filename): 
   file = open(filename,'r')
   text = file.read().strip(punctuation).lower()  
   print(text)

open_file('Strawson.txt','BigData.txt')
ERROR: expected str, bytes or os.PathLike object, not tuple 

How do I do this the right way?

  • How are you calling that function? If you want to pass multiple file names, then you'll have to loop over the tuple, and open & process each file. – PM 2Ring Dec 09 '17 at 14:38
  • I edited the question to show how I'm calling the function. How would I go about doing what you just said? @PM2Ring –  Dec 09 '17 at 14:42
  • BTW, the `.strip(chars)` method only removes characters from the beginning and end of the string, as soon as it encounters a char that's not in the nominated chars it goes no further. So with `file.read().strip(punctuation)` that will only remove the punctuation chars from the very start and end of the file. Is that what you want? Or do you want to remove all punctuation from the whole file? – PM 2Ring Dec 09 '17 at 14:47
  • I want to remove the punctuation from the whole file. But my more important question is how to import each text file into the function. It's simply not doing that. –  Dec 09 '17 at 14:52
  • Understood, but I figured I might as well fix the punctuation thing at the same time. ;) I'll post an answer shortly. – PM 2Ring Dec 09 '17 at 14:53
  • Please see here for more info: https://stackoverflow.com/questions/36901/what-does-double-star-asterisk-and-star-asterisk-do-for-parameters – PM 2Ring Dec 09 '17 at 15:09

1 Answers1

2

When you use the *args syntax in a function parameter list it allows you to call the function with multiple arguments that will appear as a tuple to your function. So to perform a process on each of those arguments you need to create a loop. Like this:

from string import punctuation

# Make a translation table to delete punctuation
no_punct = dict.fromkeys(map(ord, punctuation))

def open_file(*filenames):
    for filename in filenames:
        print('FILE', filename)
        with open(filename) as file:
            text = file.read()
        text = text.translate(no_punct).lower()
        print(text)
        print()

#test

open_file('Strawson.txt', 'BigData.txt')

I've also included a dictionary no_punct that can be used to remove all punctuation from the text. And I've used a with statement so each file will get closed automatically.


If you want the function to "return" the processed contents of each file, you can't just put return into the loop because that tells the function to exit. You could save the file contents into a list, and return that at the end of the loop. But a better option is to turn the function into a generator. The Python yield keyword makes that simple. Here's an example to get you started.

def open_file(*filenames):
    for filename in filenames:
        print('FILE', filename)
        with open(filename) as file:
            text = file.read()
        text = text.translate(no_punct).lower()
        yield text

def create_tokens(*filenames):
    tokens = [] 
    for text in open_file(*filenames):
        tokens.append(text.split())
    return tokens

files = '1.txt','2.txt','3.txt'
tokens = create_tokens(*files)
print(tokens)

Note that I removed the word.strip(punctuation).lower() stuff from create_tokens: it's not needed because we're already removing all punctuation and folding the text to lower-case inside open_file.

We don't really need two functions here. We can combine everything into one:

def create_tokens(*filenames):
    for filename in filenames:
        #print('FILE', filename)
        with open(filename) as file:
            text = file.read()
        text = text.translate(no_punct).lower()
        yield text.split()

tokens = list(create_tokens('1.txt','2.txt','3.txt'))
print(tokens)
PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • Wow, you rock, thank you. Works perfectly. But I'm having trouble using that function in another function. I added return(text) in order to create tokens for each file but it only returns the first file. How do I get the first function to return each file manipulation?? def create_tokens(): text = open_file('1.txt','2.txt','3.txt') tokens = [] for word in text.split(): tokens.append(word.strip(punctuation).lower()) print(tokens) return tokens –  Dec 09 '17 at 15:27
  • @vvv12309 If you put a `return` statement inside a loop, then the function will return as soon as control hits that `return`, the loop will stop executing. It sounds like you need a generator. I'll add some more code to my answer. – PM 2Ring Dec 09 '17 at 15:56