4

I don't have that much experience programming in python, so my question might be very simple. I am working on a project for school, where we need to create an output from a user input (we want to emulate Eliza, but in a very simple way). Right now, the only thing the program does is look for keywords and respond accordingly. However, since I go to a french school and everything is in french, I need to use accents (e.g. é, è, à, á). And this is where the problem arrives: I cannot use accents in my predefined list of keywords (so it's not the keywords from the user input, but in my own program :/ )

Here you have the entire program:

#Greeting from bot
print("Salut!")

(Here are the predefined keyword we use to know if there is a negation and if there are any demeaning adjectives in the user's sentence - we'll need to make it a lot bigger in order to make our bot more humanlike but it's just the principle)

negation = ("ne", "n'", "pas", "guere", "ne", "plus", "jamais")
AnD = ("artificiel", "cree", "mecanique", "invente", "concu", "construit", "programme", "innove", "fabrique")

But here is the problem: in order to run our program, we had to remove all the accents. So words like guère, créé, mécanique, inventé, programmé become guere, cree, mecanique, invente, programme, and in some cases it changes the meaning of the word (e.g. programme becomes a noun instead of an adjective)

adj_neg_decl = None
nega = None

#FONCTION: phrase declarative, pas de negations, adjectif pejoratif
def decl_non_neg_adj_neg(statement):
    if (statement.endswith('.')) and (nega == False) and (adj_neg_decl == True):
        print(AnD_ici + '? Mais moi je suis humain. Ca se voit pas?')
    else:
    #Only for testing
        print('tralalalala')


while True:

    statement = raw_input("> ").lower()

    if statement == "au revoir":
        print("Au revoir!")
        break

(Here we simply look for negations, and adjectives - if you have another (more efficient) way of doing it, I would love to hear your suggestions! I might as well learn some new stuff :) )

    for i in negation:
        if i in statement:
            nega = True
            break
        else:
            nega = False

    for i in AnD:
        if i in statement:
            adj_neg_decl = True
            AnD_ici = i
            break
        else:
            adj_nej_decl = False

    decl_non_neg_adj_neg(statement)

#Reset variables
    for var in (adj_neg_decl, nega):
        var = None

I had thought of maybe using unicode or utf-8, but I don't really know how to use it... Is that the right way to go? Or is there something else I can do?

Thank you so much for your help :)

  • 6
    Are you using Python 2 or 3? The unicode handling changed a lot between versions. – Aurora0001 Dec 01 '16 at 16:59
  • 2
    why do you have to remove all the accents ? Do you get error message ? Then show full error message in question. – furas Dec 01 '16 at 17:08
  • Yes, you have to use unicode for your program. In Python 3 it should work out of the box for normal text. In Python 2 prefix your strings with letter u, like u"mécanique". – Aidas Bendoraitis Dec 01 '16 at 17:58
  • 3
    Does your Python file define a [source encoding](https://www.python.org/dev/peps/pep-0263/)? If not: make it so so. always. The default is ASCII for Python 2 and [UTF-8 only for Python 3](https://docs.python.org/3/tutorial/interpreter.html#source-code-encoding). – dhke Dec 01 '16 at 18:16
  • _"we had to remove all the accents"_ - you don't have to remove the accents. Python 3 assumes that a script is utf-8 encoded unicode which does support French (and multiple hundreds of other languages). If this is a requirement for school, then let us know. Otherwise, just make sure you write your program in a utf-8 enabled editor and `"guère, créé, mécanique, inventé, programmé"` works. – tdelaney Dec 01 '16 at 18:40

1 Answers1

-1

I'm using Python 2, and I think that lists might be a good way to store your keywords.

I could type in the accents directly. For the é, I held down my Alt-key and then pressed these numbers from the numeric keypad: 130. (If you want to find other characters, please search for windows alt key symbols in your favorite search engine.)

It might also be helpful to use the in operator to tell if a given word is within your list.

Here's an example.

>>> AnD = ['créé', 'mécanique', 'artificiel']
>>> 'jamais' in AnD
False
>>> 'créé' in AnD
True

Bonne chance!

rajah9
  • 11,645
  • 5
  • 44
  • 57
  • 1
    That doesn't work in python 2. `'créé'` is still utf-8 encoded (`len('créé')` is 6, not 4) and won't compare against a unicode equivalent (`'créé' != u'créé'`). – tdelaney Dec 01 '16 at 18:42