Look for unique ID pattern which easy indexed by search engines

Question

Like from Microsoft - "KB2756872" or from National Vulnerability Database - "CVE-2010-1428" or from Red Hat - "RHSA-2010:0376" or from OIDs - "1.3.6.1.4.1.311" or from UUID/GUID - "550e8400-e29b-41d4-a716-446655440000".

I want to put several jobs to UIDs. See next...

I develop blog software and have idea to put unique ID in body of each post so can easily identify that copy from local storage is correspond to remote published copy.

Also I want to post to many different blogging services so if one is down articles will be accessible from another. So link can dead but if I add UID - anyone can try web-search to find post on another service!

Also this allow to gather some article spreading statistics. Many sites just replicate content (copy-writing and rewriting bots and people) to broke search engines. With UID I easily can identify such sites...

So my question how is to make UIDs (in which form) so it would be easily indexed by search engines (web, like Google/Yahoo, and corporate, like Lucene/Solr/Sphinx/Xapian/etc).

I know about some limitation of search engine like:

only >= 3 chars for each search part
it was not indexed dust like gfh6wytrh6wu56he5gahj763

so this task s not easy...

Any advice is appreciated (books/blog articles/etc).

http://stackoverflow.com/questions/3198901/how-to-implement-an-enterprise-search — gavenkoa, Dec 16 '12 at 20:48

score 5 · Answer 1 · edited Oct 07 '21 at 05:57

You could use Tag URIs, as defined by RFC 4151.

They are globally unique, and everyone who owned a domain name or an email address for at least a day can mint them.

Note that these URIs only identify, they don’t locate. So a Tag URI doesn’t say anything about where something is published.

Let’s say your site’s domain is "example.com". If you create a blog post, you could create the following Tag URI:

tag:example.com,2012-12:cute-cat

Note that the date in this URI is not a publication date! It must be a (past) date on which you owned the domain (resp. email address). If you registered your domain in 2003, you could always use Tag URIs starting with tag:example.com,2004: (not "2003", because "2003" would mean "2003-01-01", which might be a time where you didn’t own the domain yet), followed by a (unique) string under your control. However, if you like you could always use the publication date, of course. But don’t use future dates.

Thanks for suggestion! I will look to http://en.wikipedia.org/wiki/URI_scheme to see another possible URIs suitable for my purpose. — gavenkoa, Dec 19 '12 at 09:12

score 1 · Answer 2 · answered Dec 16 '12 at 19:45

You can use year and number based article identifier just like CVE identifiers. Since you need revisions as well, you can append dot after the identifier to clarify the version. For example, for an AWesome Blog Service, AWBS-2012-1.0 would refer to original document, AWBS-2012-1.1 would refer to first revision etc.

However, you need to make sure that AWBSs are unique before you use them. CVEs are assigned manually from the pool. You would probably need some kind of service that assigns AWBS from pool. It could be a simple database query.

Another approach is to use UUID random technique so I don't need to remember all issued IDs... With prefix like you suggest. — gavenkoa, Dec 16 '12 at 19:58

Look for unique ID pattern which easy indexed by search engines

2 Answers2

Linked