I am working on pulling data from OpenCalais API and here are the details:
Input: Some paragraph (a string e.g. "Barack Obama is the President of United States." Also, what gets returned is some instance variables with offsets and lengths but not necessarily in order of occurrence.
Output (I want): Same string but with the identified entity instances with hyperlinks (which is also a string) i.e.
output="<a href="https://en.wikipedia.org/Barack_Obama"> Barack Obama </a> is the President of ""<a href="https://en.wikipedia.org/United_States"> United States. </a>"
BUT IT IS A PYTHON QUESTION REALLY.
This is what I have
#API CALLS ABOVE WHICH IS NOT RELEVANT.
output=input
for x in range(0,result.print_entities()):
print len(result.entities[x]["instances"])
previdx=0
idx=0
for y in range(0,len(result.entities[x]["instances"])):
try:
url= "https://permid.org/1-" + result.entities[x]['resolutions'][0]['permid']
except:
url="https://en.wikipedia.org/wiki/"+result.entities[x] ["name"].replace(" ", "_")
print "Generating wiki page link"
print url+"\n"
#THE PROBLEM STARTS HERE
offsetstr=result.entities[x]["instances"][y]["offset"]
lenstr=result.entities[x]["instances"][y]["length"]
output=output[:offsetstr]+"<a href=" + url + ">" + output[offsetstr:offsetstr+lenstr] + "</a>" + output[offsetstr+lenstr:]
print output
Now the issue is, if you read the code properly you'll know that after the first iteration, the output string changes - therefore for subsequent iterations, the offset values no longer applies in the same manner. So, I cannot make the expected change.
Basically trying to get:
input = "Barack Obama is the President of United States"
output= "<a href="https://en.wikipedia.org/Barack_Obama"> Barack Obama </a> is the President of ""<a href="https://en.wikipedia.org/United_States"> United States. </a>."
How can it be done, I wonder. Tried splicing n dicing but string just gets garbled.