3

I am going to run a process that may take minutes or even hours. To track history of such runs I create a node of custom type per run with relevant process metadata stored within. Additionaly I want to store log file under such node. This seems to be more consistent and convenient approach rather than storing log file on a disk, separately from process meta.

Now nt:file nodetype itself has a jcr:content subnode with jcr:data property that allows me to store binary content. This is fine for one-time or infrequent content change of file.

However I am going to continuously append new content to that file and, additionally to that, poll it's content in separate threads (to track progress).

The JCR API in face of javax.jcr.ValueFactory, javax.jcr.Binary doesn't really seem to support such approach, I'd rather be forced to override that file (or more accurately - binary property) over and over again every time I add a single line of log. I am concerned about performance.

I've searched documentation for tools that would allow me to open output stream for that file and periodically flush changes from that stream to JCR, but there seems to be nothing like that available.

So is there anything more clever than going with plain javax.jcr.ValueFactory and javax.jcr.Binary?

Dmitry Kankalovich
  • 553
  • 2
  • 8
  • 19
  • 2
    You can do that with ValueFactory and Binary (as you have mentioned) but JCR is not optimised for such implementation. It's a tree structured database so you either need to store your data in tree format or as a list under a node format. For a good example, check out how auditing is implemented under /var/audit node. It's structured node stored in list under namespaces with metadata stored in jcr properties. Storing large files and updating them on regular basis is going to cause lots of issues in your datastore. – Imran Saeed May 11 '17 at 19:18
  • 1
    Adding to the remarks of @i.net I would also advice not to use a binary and append to it. Depending on the number of log entries you expect you might be better of just adding nodes under a root "logging" node. Every event is a new node. This has the advantage that you can create a customer `ResourceChangeListener` that will be called every time you add a node. This way you can do further processing if needed. See: https://docs.adobe.com/docs/en/aem/6-2/develop/ref/javadoc/org/apache/sling/api/resource/observation/ResourceChangeListener.html – Jens May 11 '17 at 21:07
  • Yep, I was considering that approach as well. Add custom nodetype for such log message entry and index it. Slighly concerned about performance either, as I cannot really predict amount of log entries at this point, do not want to store too large amount of nodes in a flat structure. – Dmitry Kankalovich May 12 '17 at 08:28
  • On the other hand: if we would speak about conventional relational DB there wouldn't even be a discussion about approach: just store log messages in a table, period. So given that, storing logs as nodes per entry seems at least more natural to the underlying storage system, comparing to files. – Dmitry Kankalovich May 12 '17 at 09:50
  • What are these log entries? Are they needed for future use? Should they just be sent to a logger? Do you just need the last few? Can you filter out unnecessary log entries? – TedTrippin May 15 '17 at 15:29
  • @TedTrippin basically those are just info and error messages describing how some content migration process was going on. I'd say we need to keep them for a few days. I'd like to keep separate log per migration run. I am now considering creating log files on a FS and expose access to their content through custom API. – Dmitry Kankalovich May 16 '17 at 16:12

1 Answers1

2

After considering a while all options I have here:

  1. Keep logs in memory, save them to CRX every time user invokes info/warn/error. Pros: Logs stored at the same place as migration task metadata, easy to locate and access. Cons: Potentially the slowest and least resource effective of all approaches in case of large amount of log entries.

  2. Keep logs in memory, save them to JCR only at the end of migration. Pros: Simple to refactor current solution, less pressure on CRX during migration process. Cons: Cannot track real time progress, potentiall loss of logs during unexpected error or instance shutdown.

  3. Create a node of custom type for each log entry instead of log.txt. Aggregate logs in text file via special log servlet. i.e. /var/migration/uuid/log.txt or /var/migration/uuid/log.json. Pros: More JCR way to store such content. With custom node type and indexes should be fast enough to consider as an option. Has diversity to support text and json format of logs. Cons: Unclear performance comparison to current approach. Potential issues due to large amount of nodes located at the same level. User should be aware of log servlet existence, otherwise user cannot see them in convenient format. Log servlet performance is not clear in case of large amount of log entries.

  4. Create log files on a file system (let's say at crx-quickstart/logs/migration/<uuid>.log), display it's content (if needed) through API, with ability to trunkate log API response to last 100-1000 lines. Pros: Classic and well-known log approach when log files stored on a file system. Sling provides configured binding slf4j to LogBack with all needed LogBack dependencies exported for usage in your custom bundles. Cons: Separation of logs and task metadata. User should be aware of log files location on a disk.

Starting with option 1 I then realized that log entries amount potentially can scale to hundreds of thousands - rare, but possible scenario. So finally decided to go with option 4.

In case anyone will face similar task I am posting here the implementation details for the option 4, since it is not that trivial as it might seem at first.

I am using AEM 6.2 (Felix-Jackrabbit-Sling under the hood) and I want every migration task run - which is essentially just a separate thread - to create it's own log file with a special name - a unique identifier of that migration process.

Now, Sling itself allows you to define multiple log configurations through org.apache.sling.commons.log.LogManager.factory.config OSGi configuration. However these log configs too simple for this case - you cannot create with it what is called in LogBack a SiftingAppender - special case of log appender that will instantiate appenders for particular loggers per thread, rather than once and application-wide - in other words you cannot instruct LogBack to create file per thread using OSGi configuration.

So logically thinking you would want to programmatically take Sling's LogBack configuration on a runtime (for example at the moment you've uploaded your custom bundle and activated it) and use it to configure such appender for specific loggers. Unfortunatelly, though there is a lot of documentation on how to configure LogBack through logback.xml, there are quite few docs that describe how to do it programmatically through LogBack's Java objects like ch.qos.logback.classic.LoggerContext, and like zero of them explain how to configure that way a SiftingAppender.

So after reading LogBack sources and tests I ended up with this helper class:

import ch.qos.logback.classic.LoggerContext;
import ch.qos.logback.classic.encoder.PatternLayoutEncoder;
import ch.qos.logback.classic.sift.MDCBasedDiscriminator;
import ch.qos.logback.classic.sift.SiftingAppender;
import ch.qos.logback.classic.spi.ILoggingEvent;
import ch.qos.logback.core.Appender;
import ch.qos.logback.core.Context;
import ch.qos.logback.core.FileAppender;
import ch.qos.logback.core.joran.spi.JoranException;
import ch.qos.logback.core.sift.AppenderFactory;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.MDC;

import java.util.Objects;

/**
 * This class dynamically adds configuration to AEM's LogBack logging implementation behind slf4j.
 * The point is to provide loggers bound to specific task ID and therefore specific log file, so
 * each migration task run will be written in it's standalone log file.
 * */
public class LogUtil {

    static {
        LoggerContext rootContext = (LoggerContext) LoggerFactory.getILoggerFactory();

        ch.qos.logback.classic.Logger logger = rootContext.getLogger("migration-logger");

        //since appender lives until AEM instance restarted
        //we are checking if appender had being registered previously 
        //to ensure we won't do it more than once
        if(logger.getAppender("MIGRATION-TASK-SIFT") == null) {
            MDCBasedDiscriminator mdcBasedDiscriminator = new MDCBasedDiscriminator();
            mdcBasedDiscriminator.setContext(rootContext);
            mdcBasedDiscriminator.setKey("taskId");
            mdcBasedDiscriminator.setDefaultValue("no-task-id");
            mdcBasedDiscriminator.start();

            SiftingAppender siftingAppender = new SiftingAppender();
            siftingAppender.setContext(rootContext);
            siftingAppender.setName("MIGRATION-TASK-SIFT");
            siftingAppender.setDiscriminator(mdcBasedDiscriminator);
            siftingAppender.setAppenderFactory(new FileAppenderFactory());
            siftingAppender.start();

            logger.setAdditive(false);
            logger.setLevel(ch.qos.logback.classic.Level.ALL);
            logger.addAppender(siftingAppender);
        }
    }

    public static class FileAppenderFactory implements AppenderFactory<ILoggingEvent> {

        @Override
        public Appender<ILoggingEvent> buildAppender(Context context, String taskId) throws JoranException {
            PatternLayoutEncoder logEncoder = new PatternLayoutEncoder();
            logEncoder.setContext(context);
            logEncoder.setPattern("%-12date{YYYY-MM-dd HH:mm:ss.SSS} %-5level - %msg%n");
            logEncoder.start();

            FileAppender<ILoggingEvent> appender = new FileAppender<>();
            appender.setContext(context);
            appender.setName("migration-log-file");
            appender.setFile("crx-quickstart/logs/migration/task-" + taskId + ".log");
            appender.setEncoder(logEncoder);
            appender.setAppend(true);
            appender.start();

            //need to add cleanup configuration for old logs ?

            return appender;
        }
    }

    private LogUtil(){
    }

    public static Logger getTaskLogger(String taskId) {
        Objects.requireNonNull(taskId);
        MDC.put("taskId", taskId);
        return LoggerFactory.getLogger("migration-logger");
    }

    public static void releaseTaskLogger() {
        MDC.remove("taskId");
    }
}

The part to pay attention is that SiftingAppender requires you to implement AppenderFactory interface, that will produce configured appenders for logger per thread to work with.

Now you can obtain logger via:

LogUtil.getTaskLogger("some-task-uuid")

And use it to create log files like crq-quickstart/logs/migration/task-<taskId>.log per provided taskId

According to the docs you are also required to release such logger once you've done with it

LogUtil.releaseTaskLogger()

And pretty much that's it.

Dmitry Kankalovich
  • 553
  • 2
  • 8
  • 19