0

I'm stuck on a problem. Basically anywhere you see an anchor in its equivalent sentence list, replace it with its equivalent lemma_id from lemma_ids. See output below for final output sample for this case.

lemma_ids = [['Mr_bn:00055346n',
  'President_bn:00064234n',
  'Mr_bn:00055346n',
  'speak_bn:00090943v',
  'policy_bn:00063330n'],
 ['genuine_bn:00101997a',
  'flaw_bn:00035142n',
  'democracy_bn:00021207n',
  'EU_bn:00021127n']]

anchors = [['Mr', 'President', 'Mr', 'spoke', 'policy'],
 ['genuine', 'flaw', 'democracy', 'EU']]

sentences = ['Finally , Mr President , Mr Santer among others spoke of taking a fresh look at institutional policy .',
 'This is a genuine flaw in European democracy .']

Output needed

output_need_is= [['Finally , Mr_bn:00055346n President_bn:00064234n , Mr_bn:00055346n Santer among others speak_bn:00090943v of taking a fresh look at institutional policy_bn:00063330n'], ['This is a genuine_bn:00101997a flaw_bn:00035142n in European democracy_bn:00021207n']]

Here is what I have done below, but not good enough to write home about and besides I didn't get the expected result.

def lemma_ize(lemma_ids, anchors, sentences):
    new_sentences = []
    for sentence_no, sentence in enumerate(sentences):
        for anchoritem, item in enumerate(anchors[sentence_no]):
            sentence = sentence.replace(item, lemma_ids[sentence_no][anchoritem])
        new_sentences.append(sentence)
    return new_sentences
print(lemma_ize(lemma_ids, anchors, sentences))

Here is the result I have below. Another problem with it is that the first Mr_bn after Finally should be Mr_bn:00055346n instead of Mr_bn:00055346n_bn:00055346n

['Finally , Mr_bn:00055346n_bn:00055346n President_bn:00064234n , Mr_bn:00055346n_bn:00055346n Santer among others speak_bn:00090943v of taking a fresh look at institutional policy_bn:00063330n .', 'This is a genuine_bn:00101997a flaw_bn:00035142n in European democracy_bn:00021207n .']

Also what if the lists don't have the same length? I'm not sure I am close to any solution. I need help.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
Afolabi Olaoluwa
  • 1,898
  • 3
  • 16
  • 37

3 Answers3

1
# These list will be replaced in the 1st and 2nd sentence
lemma_ids = [['Mr_bn:00055346n',
  'President_bn:00064234n',
  'speak_bn:00090943v',
  'policy_bn:00063330n'],
 ['genuine_bn:00101997a',
  'flaw_bn:00035142n',
  'democracy_bn:00021207n',
  'EU_bn:00021127n']]

# these are the words to be replaced in the two sentences with the above lemma_id
anchors = [['Mr', 'President', 'spoke', 'policy'],
 ['genuine', 'flaw', 'democracy', 'EU']]

# these are the sentences
sentences = ['Finally , Mr President , Mr Santer among others spoke of taking a fresh look at institutional policy .',
 'This is a genuine flaw in European democracy .']

def lemma_ize(lemma_ids, anchors, sentences):
    """The code to replace the words in the two sentences"""
    new_sentences = []
    for sentence_no, sentence in enumerate(sentences):
        for anchoritem, item in enumerate(anchors[sentence_no]):
            sentence = sentence.replace(item, lemma_ids[sentence_no][anchoritem])
        new_sentences.append([sentence])
    return new_sentences
print(lemma_ize(lemma_ids, anchors, sentences))

output:

[['Finally , Mr_bn:00055346n President_bn:00064234n , Mr_bn:00055346n Santer among others speak_bn:00090943v of taking a fresh look at institutional policy_bn:00063330n .'], ['This is a genuine_bn:00101997a flaw_bn:00035142n in European democracy_bn:00021207n .']]

Leaving the duplicates in the anchor list

lemma_ids = [['Mr_bn:00055346n',
  'President_bn:00064234n',
  'Mr_bn:00055346n',
  'speak_bn:00090943v',
  'policy_bn:00063330n'],
 ['genuine_bn:00101997a',
  'flaw_bn:00035142n',
  'democracy_bn:00021207n',
  'EU_bn:00021127n']]

anchors = [['Mr', 'President', 'Mr', 'spoke', 'policy'],
 ['genuine', 'flaw', 'democracy', 'EU']]

sentences = ['Finally , Mr President , Mr Santer among others spoke of taking a fresh look at institutional policy .',
 'This is a genuine flaw in European democracy .']

def lemma_ize(lemma_ids, anchors, sentences):
    new_sentences = []
    anchors_check = []
    for sentence_no, sentence in enumerate(sentences):
        for anchoritem, item in enumerate(anchors[sentence_no]):
            # check if the item is already been replaced
            item_is_duplicate = item in anchors_check
            # if not it appends the item to the checklist and replace
            if not item_is_duplicate:
                anchors_check.append(item)
                sentence = sentence.replace(item, lemma_ids[sentence_no][anchoritem])
        new_sentences.append([sentence])
    #print(anchors_check)
    return new_sentences
print(lemma_ize(lemma_ids, anchors, sentences))

output:

[['Finally , Mr_bn:00055346n President_bn:00064234n , Mr_bn:00055346n Santer among others speak_bn:00090943v of taking a fresh look at institutional policy_bn:00063330n .'], ['This is a genuine_bn:00101997a flaw_bn:00035142n in European democracy_bn:00021207n .']]

PythonProgrammi
  • 22,305
  • 3
  • 41
  • 34
  • Did you check the last part of the question? The first Mr after finally should be `Mr_bn:00055346n` instead of `Mr_bn:00055346n_bn:00055346n` and also in the reasult I have `speak` instead of `spoke` – Afolabi Olaoluwa May 21 '19 at 17:35
  • @AfolabiOlaoluwaAkinwumi I think I fixed the Mr problem – PythonProgrammi May 21 '19 at 17:58
  • What’s wrong with speak? Could you explain? – PythonProgrammi May 21 '19 at 18:00
  • So I have to change my list to make it work. What if I don't want to change the content in ly anchors list and Mr occurs twice in it? – Afolabi Olaoluwa May 21 '19 at 18:01
  • You should create a list that collects all the anchors and that allows to replace each item only if it is not in the list, so that it will not replace the Mr_...... that has been replaced by the first Mr, that is the reason why you had that unwanted replacement – PythonProgrammi May 21 '19 at 18:57
1

The only issue in your code is you are modifying sentence in sentence = sentence.replace(item, lemma_ids[sentence_no][anchoritem]) while iterating over it.

You want to modify a copy of the list instead, taking the copy using list slicing sentence[:]

So if we change

sentence = sentence.replace(item, lemma_ids[sentence_no][anchoritem])

to

 sentence = sentence[:].replace(item, lemma_ids[sentence_no][anchoritem])

The code works perfectly fine and updated code will be

def lemma_ize(lemma_ids, anchors, sentences):
    new_sentences = []
    for sentence_no, sentence in enumerate(sentences):
        for anchoritem, item in enumerate(anchors[sentence_no]):
            #Modify a copy of sentence 
            sentence = sentence[:].replace(item, lemma_ids[sentence_no][anchoritem])
        new_sentences.append(sentence)
    return new_sentences

and the output will be

['Finally , Mr_bn:00055346n_bn:00055346n President_bn:00064234n , Mr_bn:00055346n_bn:00055346n Santer among others speak_bn:00090943v of taking a fresh look at institutional policy_bn:00063330n .', 'This is a genuine_bn:00101997a flaw_bn:00035142n in European democracy_bn:00021207n .']
Devesh Kumar Singh
  • 20,259
  • 5
  • 21
  • 40
-2

if I recall correctly, enumerate runs through all the items in a list, unless you tell it where to start/stop. It would appear your enumerate statements are causing all of the list to output before going to the next statement.

it also appears that your print statement is calling a function, but passing in the entire lists as variables into your function. I'd also like to see more code if you have it. Not sure if your top part is code you still have in your program or not.

edit: The way a function works is that you define variables, but you don't pass in the same name. ex:

varone=1
vartwo=1
def functionname(variableone, variabletwo):
    #does whatever function does

functionname(varone, vartwo)

in this case varone is passed in for variableone, and vartwo is passed in to variabletwo. anytime you reffer to variableone in the function, you are referring to whatever you passed into it- in this case varone. So wherever you put variableone into the function as a variable, it will use varone, since that's what we passed into it with the call for function name.

Romegypt
  • 7
  • 2