-1

We just learned about using regular expression in my first python course (extremely new to programming), and one of the homework problems that I am struggling with requires us to use regular expression to find all the words of length n or longer, and then use that regular expression to find the longest word used from a text file.

I have no problem when I want to test out a specific length, but it returns an empty list when I use an arbitrary variable n:

import re
with open('shakespeare.txt') as file:
    shakespeare = file.read()

n = 10 #if I take this out and put an actual number in the curly bracket below, it works just fine.

words = re.findall('^[A-Za-z\'\-]{n,}', shakespeare, re.M)
print(words)
len(words)

I'm not sure what I did wrong and how to resolve this. Any help is greatly appreciated!

For more context... To find the longest word, I used:

#for word with special characters such as '-' and '''
longest_word = max(re.findall('\S+', shakespeare, re.M), key = len)

#for word without special characters:
longest_pure_word = max(re.findall('[A-Za-z]+ ', shakespeare, re.M), key = len)

output1(special char): tragical-comical-historical-pastoral
output2(pure word): honorificabilitudinitatibus

I didn't use n because I couldn't get the first part of the question to work.

A.Far
  • 21
  • 5
  • 2
    You need to mark strings as `r''` or use double backslashes for escapes instead. Also to put the value of `n` into the string you need to format it, using `'{{{n}}}'.format(n=n)`. The extra `'{{...}}'` are required to "survive" the formatting. Or you can use `'{%d}' % n` instead. – a_guest Mar 16 '19 at 21:39
  • Why would you expect `n` to be replaced by a variable's contents, but no other character in the expression to be so replaced? – Scott Hunter Mar 16 '19 at 21:39

1 Answers1

1

Try this:

import re
with open('shakespeare.txt') as file:
    shakespeare = file.read()

n = 10

words = re.findall('^[A-Za-z\'\-]{'+str(n)+',}', shakespeare, re.M)
print(words)
len(words)
Nathaniel
  • 3,230
  • 11
  • 18