0

I am trying to extract causal arguments at the sentence level. so far, my code works but somehow returns the wrong arguments.

Such that: SRL demo for sentence 'Our results may be materially adversely affected by the outcomes of litigation, legal proceedings and other legal or regulatory matters.'

the causing argument is " the outcomes of litigation, legal proceedings and other legal or regulatory matters " and this corresponds to A1 (aka Arg1).

#requirements:
from allennlp.predictors import Predictor
predictor = pretrained.load_predictor(model_id="structured-prediction-srl-bert")
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

my code to obtain arg1:

def extract_arg1(sentence):
  result = []
  try:
    try:
      output = predictor.predict(sentence)
    except Exception as e:
      print(e)
      tokenized_sentence = tokenizer(sentence, max_length=500, 
                                    truncation=True, 
                                    padding=False, 
                                    add_special_tokens=False)
      tokens = tokenized_sentence.tokens()
      output = predictor.predict_tokenized(sentence)
    for verb in output['verbs']:
      desc = verb['description']
      arg1_start = desc.find('ARG1: ')
      if arg1_start > -1:
        arg1_end = arg1_start + len('ARG1: ')
        arg1 = desc[arg1_end: desc.find(']')]
        result.append((verb['verb'], arg1))
    return result
  except Exception as e:
    print(e)
    return -1


#loop over all sentences
from tqdm.notebook import tqdm
tqdm.pandas()

df['Arg1'] = df.sentence.progress_apply(extract_arg1)

however, this process returns : [(affected, Our results)] but I need [(affected, the outcomes of litigation, legal proceedings and other legal or regulatory matters )]

hilo
  • 116
  • 11
  • 1
    In general, it's unavoidable that the model is going to make error sometimes. In this case, it might be that the system only predicts the constituents heads (not sure). If so, the full proposition could be extracted by following the dependencies. – Erwan Jul 19 '22 at 16:49
  • Hi Erwan, thank you for this nice comment. I appreciate it. I am trying to find a way to save multiple Arg1s instead of one. it is very odd, the code returns almost all arg0, verb pairs but fails sometimes. I was wondering if there is any way to improve this... – hilo Jul 19 '22 at 16:50
  • I'm not up to date with SRL. but it's a difficult task and when I was working on it 12 years ago it was considered normal not to obtain very good results in general. I don't know if it has improved since then but afaik SRL in general is still "experimental". – Erwan Jul 20 '22 at 08:43

0 Answers0