1

I am running alfresco 4.2 on REDHAT 7 server. so I have to deal with Lucene 2.4. The issue am dealing with is that Lucene indexes are being corrupted more and more often. Every time that happen the repo go off. and a full re indexation, make the server goes up again.

I need help to know more about what is causing the index corruption. and how to deal with (the reindexation take a lot of time)

tekamed
  • 13
  • 5
  • 1
    Ring Alfresco support and ask for their help? That's what they're there for! – Gagravarr Jan 29 '18 at 10:51
  • What do your mean by "corrupted index" ? Do you have facts, logs that say/prove so ? – Akah Jan 29 '18 at 11:58
  • @Gagravarr we will do so. thanks – tekamed Jan 29 '18 at 16:23
  • @Akah the log messages are quite different each time. here you can find the log for the last time it happen to us. https://pastebin.com/tqEK21NF What makes me think that it's "indexes corruption" is the fact that a full re indexation do the job for a period of time. – tekamed Jan 29 '18 at 16:24
  • 1
    Do you delete the index before redoing a full reindex ? – Akah Jan 29 '18 at 16:27
  • Yes,actually I rename the folders "lucene-indexes" and "lucene-indexes-backup" to respectively "lucene-indexes-old" and "lucene-indexes-backup-old" before starting alfresco in full indexation mode. – tekamed Jan 29 '18 at 16:52
  • 2
    Are the Lucene indices on local disk? Could some other process be accessing the Lucene index files? – Jeff Potts Jan 29 '18 at 23:19
  • Was the "index corruption" correlated with other events on the server? Restart, ingestion of high volumes of documents.. etc? – Younes Regaieg Jan 30 '18 at 06:30
  • @JeffPotts Yes the Lucene indices are on local disk and there's no other process accessing the Lucene index files. – tekamed Jan 30 '18 at 08:02
  • @YounesRegaieg Unfortunately, it looks like it occurs with a completely random manner. – tekamed Jan 30 '18 at 08:03
  • Seems unlikely that it would happen randomly. I've never heard of that happening. Can you switch to Solr? – Jeff Potts Jan 31 '18 at 18:41
  • @JeffPotts Actually, we are testing the migration on the preproduction environment.Do you that it will help with this issue ? – tekamed Feb 01 '18 at 07:55

2 Answers2

0

We are also using Lucene, although it is not with Alfresco. From what we have seen, we have an issue with the unique ID given by Lucene to each document actually changing sometimes when adding or deleting a document to the index... We have not yet been able to go any further, but maybe this can help put you on the right track.

Katz
  • 165
  • 6
0

Let me mention before I start in earnest: Alfresco implements Solr which uses Lucene for indexing, thus I wouldn't manage the Lucene indexes directly on Alfresco. Instead, manage your indexes via the Solr tooling Alfresco provides.

I, too, have found that the Lucene/Solr index tends to "drift" in this version of Alfresco (4.2.0). Having engaged Alfresco support on this many times, we've found no solid root cause; they say it may be attributed to "certain customizations" we've made, but they haven't been more specific than that.

So while we've not found a solution, there are proactive steps we take to mitigate the issue.

  1. There is a Solr report we check daily (https://your-alfresco-server.com:8443/solr/report/). On this report, there is a value labeled, "Count of transactions in the index but not the DB" (which is a very misleading label, in my experience). The higher this value, the more out-of-sync our index seems to be, so as it climbs we'll schedule a re-index during a time when no one will be impacted.

  2. There are services the Alfresco server exposes to fix and reindex Solr. (Full disclosure: I have not found them to be very effective, but they come recommended by Alfresco Support).

Solr re-index service: http://your-alfresco-server.com:8080/solr/admin/cores?action=REINDEX&txid=

Solr "Fix" service: http://your-alfresco-server.com:8080/solr/admin/cores?action=FIX

  1. Purging stale content can reduce the time to re-index (this includes transfer reports, etc., that Alfresco generates that tends to accumulate, but aren't--in my case at least--important).

Unfortunately, the true solution often comes down to re-indexing on a scheduled, rotating basis to minimize downtime.

rotarydial
  • 2,181
  • 2
  • 23
  • 27