0

I want to find the capital letters in a string at any position, I want that if a string consists of capital letters at any position in a string then against that string "1" should be print and if string does not containing any capital letter at any position then "0" should be print against that string. for this i write a python code but it does not work properly

file='C:/Python26/test.txt'
f=open('letters.txt','w')
pattern='[A-Z+]'
with open(file, 'r') as rf:

    for word in rf:
        for i in word.split():
            if word[0].isupper():               ## finding letters starting with uppercase letters
                  f.write(word.strip("\n")+"\t"'1'"\n");
            elif word.isupper():                ## finding string containing all capital letters
                  f.write(word.strip("\n")+"\t"'1'"\n");
            elif re.search(pattern, word):      ## finding string containing capital letter at any position 
                  f.write(word.strip("\n")+"\t"'1'"\n");
            else:
            f.write(word.strip("\n")+"\t"'0'"\n");
    f.close()

my exemplary data is like this
Src
mAB
32DC32
P50
The
activation
fan
.

NFKappaB
IL23RE
cat
.

but my out put is like this

Src 1
mAB 1
32DC32 1
P50 1
The 1
activation 0
fan 0
. 0
1
NFKappaB 1
IL23RE 0
cat 0
.
Which produce wrong result. It does not cater the white space and gave the title "1" and because of this nelection the period (.) did not get any label neither of "0" nor of "1"

Shaheen Gul
  • 93
  • 2
  • 10

1 Answers1

2

Just use re.search instead of re.match because re.match tries to match from the beginning of the string.

import re
file='infile'
f=open('outfile','w')
pattern='[A-Z]'
with open(file, 'r') as rf:
    for word in rf:
        if re.search(pattern, word):
            f.write(word.strip() + " 1\n")
        else:
            f.write(word.strip() + " 0\n")
f.close()
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • this code generates "0" against every letter in lower case while "1" against each and every capital letter while I want generation of "1" when a token consisting of a capital letter at any location while "0" when there is no capital letter in a token – Shaheen Gul Feb 21 '15 at 15:18
  • the above code will display 1 if the line has atleast one capital letter. did you mean single character as token? What is the character is not an alphabet? – Avinash Raj Feb 21 '15 at 15:25
  • Token is a whole string e.g. in "NFKappaB" it is a token or string N, F, K, a, p, p,a, B are letters – Shaheen Gul Feb 21 '15 at 16:01
  • you need to adain split the word like `for token in word: if re.search(pattern, token):` – Avinash Raj Feb 21 '15 at 16:07
  • same result no change – Shaheen Gul Feb 21 '15 at 16:19
  • your update work well, but my data is already in the form that each string/token is in a single line, when i tried ur suggestion on short data it gave correct result, but when i tried it on whole data it showed deficiency in a case that it did not handle the full stop(.),and white space in a single row – Shaheen Gul Feb 21 '15 at 17:38
  • actually my data was in the form of a paragraph in a txt file, I converted this data in such a way that each token is a single line and the full stop and white spaces are also there in my data txt file, now this code skips the period(.) and white space – Shaheen Gul Feb 21 '15 at 17:42
  • edit your question with sample input along with the expected output. – Avinash Raj Feb 21 '15 at 17:43
  • My data is like this Src mAB 32DC32 P50 The activation fan cooling CAT HELLO IL23RE NFKappaB mBa . as u wish . my – Shaheen Gul Feb 21 '15 at 17:59