6

I would like to understand how the indexing for the artifact repositories like Nexus and Artifactory works. What benefit does it provide? I mean -- how does it help and what is the logic that's used when resolving artifacts?

My understanding is that the Lucene indexes contain information concerning which artifacts are presents in a given proxied repository or group and that once these indexes have been downloaded, you can easily check if a remote repository contains the artifact you're looking for and you can try to resolve it from the repositories which have it. Is this the only use? Is the index also queried for local resolutions (because each repository does have an index)...? How does this actually work?

carlspring
  • 31,231
  • 29
  • 115
  • 197

3 Answers3

11

Artifactory doesn't use indexes for searching. We believe that indexes are the thing of the past, when machines were slow and couldn't handle large searches on the server side. Here is only partial list of why search indexes are bad:

  • Client need to download huge files before searching
  • The indexes are updated too rare to reflect frequent changes
  • System with search indexes requires special client to perform the search against
  • The client it toughly coupled with the index format.

Nowdays, when servers like Artifactory can provide real-time searching, exposed via UI for humans an API for tools like IDEs, the indexes are obsolete and supported in Artifactory only for compatibility with tools like m2eclipse.

JBaruch
  • 22,610
  • 5
  • 62
  • 90
  • 1
    Maven is not the only repository technology to use indexes. And in fairness Lucene supports partial index updates that reduces the size of index downloads. The Key advantage of using indexes is that it enables distributed searching. As for client coupling Lucene is an open format, I'm not aware of difficulties in supporting it's format although I'd imagine if there are formatting issues only vendors of alternative Maven repositories (like artifactory) would be impacted as most users would not access the index directly... I would fault Sonatype for not really documenting the Maven repo format – Mark O'Connor Jun 10 '13 at 11:39
  • @MarkO'Connor I am not sure what you mean by "distributed searching"? Just couple of days ago we were trying to find some benefits of downloadable index versus server-side API for searching and found none. Will be glad to hear (truly). – JBaruch Jun 10 '13 at 15:02
  • Understood. Maven repos are not the only technology with downloadable indexes. For example yum repositories (RPM packages) utilize the same approach, using a sqlite database to store the repository index (downloaded by the client). I'm guessing the main advantage over a REST API is scaling, all that's needed to host the repo is a HTTP server and rsync. – Mark O'Connor Jun 10 '13 at 21:43
  • 3
    Well, RPM has an excuse - it's an old technology, when computers were slow and couldn't scale with server-side search. But nowdays? My smartphone can run ElasticSearch server! – JBaruch Jun 11 '13 at 06:31
  • 1
    +1 Fair point. Now you guys have to convince Sonatype :-) The Maven client is pretty dumb though. Doesn't use REST APIs and does lots of undocumented stuff (like metadata processing client-side, really dumb). I suspect you're on a loser ...... – Mark O'Connor Jun 11 '13 at 10:10
  • We don't need to convince them. Eventually the best technology wins. Maven is not the only build tool in town, as Nexus is not the only binary repository :) – JBaruch Jun 11 '13 at 13:43
  • Baruch, thanks for your points! I agree, that to a large extent (if there is no web interface to use for searching, for example), there is perhaps not so much sense to be using indexes. Thanks for the examples concerning the IDE-s' point of view. I appreciate your experience from your work on Artifactory and value your opinion. I am accepting Tamas' answer, as in all fairness, it provides the kind of code examples I was looking for. – carlspring Jun 11 '13 at 14:59
3

Repository indexing is all about searching. The Maven Eclipse plugin documentation describes the functionality:

Maintaining a server-side index makes Maven client operation more efficient. Server-side repository managers can use indexes to enable search interfaces and REST APIs for retrieving artifacts (Sonatype Nexus doesn't need a database).

Mark O'Connor
  • 76,015
  • 10
  • 139
  • 185
  • Could you please provide some more information -- for example code, libraries that deal with this, more specifically -- some classes...? – carlspring Jun 10 '13 at 09:38
  • @carlspring I have never needed to access the indexes directly. I'm an indirect user via the Nexus UI and Eclipse plugins. I don't think they're very magical. I'd recommend reading the Apache Lucene documentation for examples of client libraries for reading indexes directly. – Mark O'Connor Jun 10 '13 at 11:35
  • Thanks, I'll have a look. I'm the same kind of user, but would like to learn more. – carlspring Jun 10 '13 at 13:44
  • Once again, thanks, Marc, for confirming my understanding was right! I am accepting Tamas' answer, as it contains the kind of code examples I was looking for. – carlspring Jun 11 '13 at 14:54
2

As Mark already said, Maven Index is all about searching (either server side, where search is exposed over UI, or using REST) or client side like for example M2E does (typical example is code completion in POM editor, where context hints uses index to provide you Gs, As and Vs while adding dependencies for example).

Nexus does NOT use index to fulfil it's main functionality: serving up artifacts and/or proxying them, while it DOES maintain the index on the fly. Again, indexes are not used in "resolution" or any other way, except for Search UI and downstream publishing reason (for clients like M2E is).

For example "client side" usage of Maven Indexer, you can look at the examples here.

HTH,
~t~

carlspring
  • 31,231
  • 29
  • 115
  • 197
Tamas Cservenak
  • 741
  • 4
  • 5