0

I have 2 different lists which contains names and location. Need to identify position of both name and location in text.

Input

Name:['Mughal'] Location: ['Panipat','Agra']

text=['The battle of Panipat laid the foundation of the Mughal dynasty in Agra.']

Output:

Start position:15;end position:21;Word:Panipat;type:Location; Start position:50;end position:55;Word:Mughal;type:Name

code:

for t in (text):
for n in name_:
    while index_ < len(t):
        index_ = t.find(n,index_)
        if index_ == -1:
            break
        else:
            kwmatch.append((index_, index_+len(n),"Name"))
            index_  += len(rect) 
    index_ = 0
a = (text,{'entities':kwmatch})
doctuple.append(a)
kwmatch = []
a = None
Muralidhar A
  • 37
  • 2
  • 5

1 Answers1

0

To begin with, it will be much easier to save your Name and Location data, if you were to use dictionaries (https://docs.python.org/3/tutorial/datastructures.html#dictionaries). e.g.

dct = {
    'Name'  : ['Mughal'],
    'Location':  ['Panipat','Agra']
}

After that, you can iterate over each text in your list of text, find the starting and ending index of the words using string.find, and your word and type can be taken from the word you are searching, and the key.

text=['The battle of Panipat laid the foundation of the Mughal dynasty in Agra.']

for t in text:
    for key, value in dct.items():
        for v in value:
            #Starting index using find
            start_pos = t.find(v)+1
            #Ending index after adding the length of word
            end_pos = start_pos+len(v)-1
            #Word and type are the word we are looking for, and the key of the dictionary
            print('Start position: {}; end position: {}; Word: {}; type: {}'.format(start_pos, end_pos, v, key))

The output then comes up as.

Start position: 50; end position: 55; Word: Mughal; type: Name
Start position: 15; end position: 21; Word: Panipat; type: Location
Start position: 68; end position: 71; Word: Agra; type: Location
Devesh Kumar Singh
  • 20,259
  • 5
  • 21
  • 40