-2

I have been trying to get this piece of python code to open a directory on my computer and read its contents, so I can then generate an output for an assignment but I keep getting "invalid \x escape".Is there something wrong with my syntax or are my forward slashes and backslashes all mixed up.

import sys,os,re
import time

define global variables used as counters

tokens = 0
documents = 0
terms = 0
termindex = 0
docindex = 0 

initialize list variable

#

alltokens = []
alldocs = []

#

Capture the start time of the routine so that we can determine the total running

time required to process the corpus

#

t2 = time.localtime() 

set the name of the directory for the corpus

#

dirname = "C:\Users\xhenr\Documents\cs3308\cacm"

For each document in the directory read the document into a string

#

all = [f for f in os.listdir(dirname)]
for f in all:
    documents+=1
    with open('C:\Users\xhenr\Documents\cs3308\cacm/f', 'r') as myfile:
        alldocs.append(f)
        data=myfile.read().replace('\n', '')  
        for token in data.split():
            alltokens.append(token)
        tokens+=1

Open for write a file for the document dictionary

#

documentfile = open('C:/Users/xhenr/Documents/cs3308/cacm/documents.dat', 'w')
alldocs.sort()
for f in alldocs:
  docindex += 1
  documentfile.write(f+','+str(docindex)+os.linesep)
documentfile.close()

#

Sort the tokens in the list

alltokens.sort()

#

Define a list for the unique terms

g=[]

#

Identify unique terms in the corpus

for i in alltokens:    
    if i not in g:
       g.append(i)
       terms+=1

terms = len(g

)

Output Index to disk file. As part of this process we assign an 'index' number to each unique term.

indexfile = open('C:/Users/xhenr/Documents/cs3308/cacm/index.dat', 'w')
for i in g:
  termindex += 1
  indexfile.write(i+','+str(termindex)+os.linesep)
indexfile.close()

Print metrics on corpus

#

print 'Processing Start Time: %.2d:%.2d' % (t2.tm_hour, t2.tm_min)
print "Documents %i" % documents
print "Tokens %i" % tokens
print "Terms %i" % terms

t2 = time.localtime()   
print 'Processing End Time: %.2d:%.2d' % (t2.tm_hour, t2.tm_min)
xhenier
  • 49
  • 7
  • Where does your error occur? – BernardL Nov 29 '18 at 02:59
  • 1
    Possible duplicate of [Windows path in Python](https://stackoverflow.com/questions/2953834/windows-path-in-python) – Mark Nov 29 '18 at 03:03
  • It may be worth pointing out that it is preferred if questions can be reduced to their simplest form. More information can be found here https://stackoverflow.com/help/mcve. – The Matt Dec 01 '18 at 17:07

1 Answers1

0

Here:

dirname = "C:\Users\xhenr\Documents\cs3308\cacm"

Python is interpreting the backslashes as attempts to escape the following character, when actually it's a system path. You can fix this by escaping the backslashes, but there is a much easier method:

dirname = r"C:\Users\xhenr\Documents\cs3308\cacm"

By putting an r in front, you tell Python to treat the string as-is, without any escape characters. (The r stands for raw.) This also means you must change this line, too:

with open('C:\Users\xhenr\Documents\cs3308\cacm/f', 'r') as myfile:

Changed to:

with open(r'C:\Users\xhenr\Documents\cs3308\cacm\f', 'r') as myfile:

(also changed inconsistent use of forward and back slashes.

iz_
  • 15,923
  • 3
  • 25
  • 40
  • The issue can also be fixed by just using double backslashes. "\\" will make a literal "\" character. So the line could be changed to "C:\\Users\\xhenr\\Documents\\cs3308\\cacm" – The Matt Nov 29 '18 at 03:06
  • 1
    It's far easier just to add an `r` in front, but that works too. In fact, even though it's Windows, forward slashes work. – iz_ Nov 29 '18 at 03:06
  • Both methods worked but now I am getting this error: IOError: [Errno 13] Permission denied: 'C:\\Users\\xhenr\\Documents\\cs3308\\cacm'. – xhenier Nov 29 '18 at 03:14
  • A few things can cause this. It means exactly what it says: you don't have sufficient permissions to create a file there. Make sure there are no other open programs that are accessing the file. If there are not, it means the permissions for that file don't allow you to make changes. Just go to the folder, right-click, go to "Properties" -> "Security" and make changes there. – iz_ Nov 29 '18 at 03:23
  • Yes, that was it. Thank you for the insight. – xhenier Nov 29 '18 at 03:32