0

So i have this code so that i can input the unicode string from the user

print "Enter a nepali string" 
split_string=raw_input().decode(sys.stdin.encoding or locale.getpreferredencoding(True))

And i have in file some unicode string and if that unicode string matches as substring in the user input string then i have to split that string . suppose i have "सुर" in file and if that matches "सुरक्षा" which is input by user then i want only "क्षा" in output

with codecs.open("prefixnepali.txt","rw","utf-8") as prefix:
    for line in prefix:
          line=ud.normalize('NFC',line)
          if line in split_string:
             prefixy=split_string[len(line):len(split_string)]
             print prefixy
          else:
            print line

But when i run the program i get

दि

सुर

रु

Which are the unicode string in files when i input "सुरक्षा" in the terminal. Can i know what is wrong here??

deceze
  • 510,633
  • 85
  • 743
  • 889
Bishal Gautam
  • 380
  • 3
  • 16

1 Answers1

0

The problem might be simple: a line read from file has newline character at its end. Use splitlines as advised in Reading a file without newlines and Getting rid of \n when using .readlines()

with codecs.open("prefixnepali.txt","rw","utf-8") as prefix:
    for line in prefix.read().splitlines():
          line=ud.normalize('NFC',line)
          if line in split_string:
             prefixy=split_string[len(line):len(split_string)]
             print prefixy
          else:
             print line

And btw, line in split_string will look for occurrence of line anywhere within split_string. If you're looking for exactly the prefix match, you should use split_string.find(line) == 0 or split_string[0:len(line)] == line.

Community
  • 1
  • 1
Rishi
  • 121
  • 4