I'm stuck on a problem. Basically anywhere you see an anchor in its equivalent sentence list, replace it with its equivalent lemma_id from lemma_ids
. See output below for final output sample for this case.
lemma_ids = [['Mr_bn:00055346n',
'President_bn:00064234n',
'Mr_bn:00055346n',
'speak_bn:00090943v',
'policy_bn:00063330n'],
['genuine_bn:00101997a',
'flaw_bn:00035142n',
'democracy_bn:00021207n',
'EU_bn:00021127n']]
anchors = [['Mr', 'President', 'Mr', 'spoke', 'policy'],
['genuine', 'flaw', 'democracy', 'EU']]
sentences = ['Finally , Mr President , Mr Santer among others spoke of taking a fresh look at institutional policy .',
'This is a genuine flaw in European democracy .']
Output needed
output_need_is= [['Finally , Mr_bn:00055346n President_bn:00064234n , Mr_bn:00055346n Santer among others speak_bn:00090943v of taking a fresh look at institutional policy_bn:00063330n'], ['This is a genuine_bn:00101997a flaw_bn:00035142n in European democracy_bn:00021207n']]
Here is what I have done below, but not good enough to write home about and besides I didn't get the expected result.
def lemma_ize(lemma_ids, anchors, sentences):
new_sentences = []
for sentence_no, sentence in enumerate(sentences):
for anchoritem, item in enumerate(anchors[sentence_no]):
sentence = sentence.replace(item, lemma_ids[sentence_no][anchoritem])
new_sentences.append(sentence)
return new_sentences
print(lemma_ize(lemma_ids, anchors, sentences))
Here is the result I have below. Another problem with it is that the first Mr_bn after Finally
should be Mr_bn:00055346n
instead of Mr_bn:00055346n_bn:00055346n
['Finally , Mr_bn:00055346n_bn:00055346n President_bn:00064234n , Mr_bn:00055346n_bn:00055346n Santer among others speak_bn:00090943v of taking a fresh look at institutional policy_bn:00063330n .', 'This is a genuine_bn:00101997a flaw_bn:00035142n in European democracy_bn:00021207n .']
Also what if the lists don't have the same length? I'm not sure I am close to any solution. I need help.