A nesting issue
How I was tagged: Instead of extract text and tagging I am adding mcid's to the existing content stream (both open and closing ex: /p<< MCID 0 >> BDC .. .. .. EMC
)
You're doing this incorrectly. See for example the start of the page content stream in your document:
BT
0 i
/C0_0 18 Tf
41.91 740.175 Td
/H2 <</MCID 0 >> BDC
( \) F M M P 8 P S M E) Tj
ET
/TouchUp_TextEdit MP
BT
/C0_1 14 Tf
EMC
Focusing on the beginning and end of text objects and marked content, we see that you have BT ... BDC ... ET ... BT ... EMC
According to the specification, though:
When the marked-content operators BMC, BDC, and EMC are combined with the text object operators BT and ET (see 9.4, “Text Objects”), each pair of matching operators (BMC…EMC, BDC…EMC, or BT…ET) shall be properly (separately) nested. Therefore, the sequences
BMC BT
BT BMC
… and …
ET EMC
EMC ET
are valid, but
BMC BT
BT BMC
… and …
EMC ET
BT EMC
are not valid.
(ISO 32000-1 section 14.6 "Marked Content")
This issue was fixed in the second shared PDF, res1.pdf
.
Missing ParentTree and StructParents
The problem your question focuses on is
There is an option called "Find Tag from Selection" . Is not working.
Finding a tag from selection essentially means that you have the MCID of some content stream instruction and you search the structure element in the structure tree referencing that marked content ID.
How PDF processors are expected to do this, is described in section 14.7.4.4 "Finding Structure Elements from Content Items" of the PDF specification ISO 32000-1 (or section 14.7.5.4 in ISO 32000-2):
Because a stream cannot contain object references, there is no way for content items that are marked-content sequences to refer directly back to their parent structure elements (the ones to which they belong as content items). Instead, a different mechanism, the structural parent tree, shall be provided for this purpose. For consistency, content items that are entire PDF objects, such as XObjects, shall also use the parent tree to refer to their parent structure elements.
The parent tree is a number tree, accessed from the ParentTree entry in a document’s structure tree root. The tree shall contain an entry for each object that is a content item of at least one structure element and for each content stream containing at least one marked-content sequence that is a content item.
Your PDF does not have that ParentTree at all, and your page does not contain a StructParents entry to lookup in a parent tree. Thus, the prescribed way to get from marked content to the structure tree is impossible to go.
A ParentTree was added in the third shared PDF, new.pdf
.
Incorrect ParentTree entries
While in new.pdf
you have a ParentTree, its contents are clearly incorrect:

The ParentTree is a number tree, i.e. integers are mapped to something here, so there obviously must not be multiple entries for the same integer key.
Furthermore, looking inside one of those values:

one sees that you claim that the following StructElem is the value for all marked content IDs:

Inspecting this StructElem further, one sees that it represents the final paragraph on the final page.
Thus, your observation
Now instead of "selection not found " it is highlighting the last <P> tag in parent tree. Irrespective of what what we selected.
is what one can expect. If one expects any reasonable behavior at all, that is, with a ParentTree structure broken so badly.
Actually there was not only this new.pdf
but also res.pdf
and tagged without altext.pdf
with ParentTrees, but all these ParentTrees were broken like the tree of new.pdf
.
You might want to start inspecting the structures you create when analyzing an unwanted behavior.
Another issue with parent tree entries
The previously described issue in parent trees meanwhile has been resolved, different pages now have different struct parents and the parent tree arrays now reference the struct elements for distinct MCIDs.
For some documents a different error occurs now, though, e.g. "res29_08_19.pdf". Here the parent tree starts like this:

In particular the first entry in the array is for MCID 3, the second for MCID 4, ...
This is invalid, according to the specification
The array element corresponding to each sequence shall be found by using the sequence’s marked-content identifier as a zero-based index into the array.
(ISO 32000-1 section 14.7.4.4 "Finding Structure Elements from Content Items")
Thus, the first entry must be for MCID 0, the second for MCID 1, ...
You objected in a comment
No I used 0 and 1 Mcid's for Artifacts.
But as a corollary of the above: Do not give MCIDs to marked content sequences you don't have a structure element for! MCIDs are for going back and forth between the structure hierarchy and the content streams. If you mark a piece of content without having a structure element for it, don't give it a MCID.
Yet another issue with parent tree entries
You again report problems with your newest file mathpdf.pdf. And indeed, there are issues; Adobe Acrobat Preflight reports a 5 pages list of inconsistent parent tree mappings like this:

In contrast to the previous issues the cause does not become clear by looking at the parent tree alone, one also has to look at the structure hierarchy.
Doing so, though, one peculiarity immediately hits the eye: In your parent tree you do not reference the actual parent structure element of the MCID but you reference a new structure tree node which claims to have the actual parent node from the structure hierarchy as its own parent (not actually being one of its kids) and also claims to have the MCID in question as kid.
For example let's look at the MCID 0 on the first page. In the structure hierarchy you have:

In the parent tree you have:

You should have simply referenced object 238 (the structure hierarchy parent of MCID 0) directly from the parent tree array for page one instead of that in-between object 62 which claims to have that object 238 as parent and MCID 0 as kid.
The reported inconsistency may be due to the node referenced from the parent tree (in object 62) claims to be a P paragraph with a parent node (in object 238) which is a Span. That is not allowed, a paragraph may contain a span but it cannot be contained in one.
tag in parent tree. Irrespective of what what we selected. (try with select ' hello world' and press select tag from selection in first added pdf)
– fascinating coder Aug 24 '19 at 10:55tag. Now the read single page option is not working? I don't know why. Can you please help. Parent tree is fine I think. https://drive.google.com/file/d/1aD1HGQsEXOovpfWdf7JRNwJhP7tX6pmy/view?usp=sharing
– fascinating coder Sep 07 '19 at 14:48tag is not reading and directly going to H3. But if you read top to bottom it is reading fine.
– fascinating coder Sep 09 '19 at 05:57