79

For now, I have a Hadoop job which creates counters with a pretty big name.

For example, the following one: stats.counters.server-name.job.job-name.mapper.site.site-name.qualifier.qualifier-name.super-long-string-which-is-not-within-standard-limits. This counter is truncated on web interface and on getName() method call. I've found out that Hadoop has limitations on the counter max name and this settings id mapreduce.job.counters.counter.name.max is for configuring this limit. So I incremented this to 500 and web interface now shows full counter name. But getName() of the counter still returns truncated name.

Could somebody, please, explain this or point me on my mistakes? Thank you.

EDIT 1

My Hadoop server configuration consists of the single server with HDFS, YARN, and map-reduce itself on it. During map-reduce, there are some counter increments and after the job is completed, in ToolRunner I fetch counters with the use of org.apache.hadoop.mapreduce.Job#getCounters.

EDIT 2

Hadoop version is the following:

Hadoop 2.6.0-cdh5.8.0
Subversion http://github.com/cloudera/hadoop -r 042da8b868a212c843bcbf3594519dd26e816e79 
Compiled by jenkins on 2016-07-12T22:55Z
Compiled with protoc 2.5.0
From source with checksum 2b6c319ecc19f118d6e1c823175717b5
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.8.0.jar

I made some additional investigation and it seems that this issue describes a situation similar to mine. But it's pretty confusing cause I'm able to increase the number of counters but not the length of counter's name...

EDIT 3

Today, I spent pretty much time debugging internals of the Hadoop. Some interesting stuff:

  1. org.apache.hadoop.mapred.ClientServiceDelegate#getJobCounters method returns a bunch of counters from yarn with TRUNCATED names and FULL display names.
  2. Was unable to debug maps and reducers itself but with help of logging it seems that org.apache.hadoop.mapreduce.Counter#getName method works correctly during reducer execution.
R1w
  • 119
  • 1
  • 2
  • 9
mr.nothing
  • 5,141
  • 10
  • 53
  • 77
  • 2
    Can you please provide more details on the `getName()` call that still returns the truncated name? Is this iterating over the counters returned from `Job#getCounters()` in the submitting client after waiting for job completion, or is it a separate application querying counters from the job history server, or is it something else entirely? I would expect your configuration to be sufficient. The web UI uses the same `getName()` call. (It would not retroactively fix truncated counter names from jobs submitted before the configuration change though.) – Chris Nauroth Jan 20 '17 at 18:46
  • @ChrisNauroth, the configuration is pretty simple, I have one server with hadoop and all it's additional software installed on it. The flow of the counters in my map-reduce: 1. Increment counters in reducers (fetched from context) 2. Fetch from Job#getCounters(). Thanks for your interest and sorry for delayed answer. – mr.nothing Jan 21 '17 at 14:33
  • 1
    @ChrisNauroth, I made some additional investigation and it seems I found something... uh, interesting. We have hadoop 2.6.0 installed and it seems that this issue https://issues.apache.org/jira/browse/MAPREDUCE-5875 describes situation similar to mine. But it's pretty confusing cause I'm able to increase number of counters but not the length of counter's name... Do you think this can be an issue? – mr.nothing Jan 24 '17 at 10:53
  • Could you please tell me the exact name (truncated one) which is you get when you call the getName() for the counter `stats.counters.server-name.job.job-name.mapper.site.site-name.qualifier.qualifier-name.super-long-string-which-is-not-within-standard-limits` – maxmithun Aug 30 '17 at 08:26
  • @DennisJaheruddin unfortunately I left that job and I had no choice but apply some temporary solutions to workaround this issue since no feedback was provided in hadoop jira. That issue still was not resolved by the day I left that job. – mr.nothing Jan 31 '18 at 09:29
  • @maxmithun as I stated before I left that job and have no access to that data anymore. If interested, I created jira ticket for hadoop https://issues.apache.org/jira/browse/MAPREDUCE-6832. Maybe it will be resolved someday. – mr.nothing Jan 31 '18 at 09:31
  • Probably not the reason and I might have understood the Java memory model incorrect, but couldn't there occur problems when in the [`Limits`](https://github.com/apache/hadoop/blob/5bca062d0e8a00dcb97c36b442dd6fcbf4dc72fa/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/counters/Limits.java#L77) class the `isInited` value leaks through to a thread (is not volatile!) and it therefore will not not call the `init` method so no memory synchronization happens and outdated values are seen. Though in this case that value would be 0 – Marcono1234 Apr 29 '19 at 20:27
  • Did you check getDisplayName() ? https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Counter.html#getDisplayName() http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapreduce/Counter.html#getDisplayName() – Jeet Jul 15 '19 at 18:47
  • @Jeet Not sure, it was pretty long ago, but since this method have no documentation we can only guess it's purpose. Based on it's name, for me it looks reliable to use this method only for visualization purposes. – mr.nothing Jul 29 '19 at 14:23
  • @mr.nothing, can you check my answer and verify if it is working? I would start to explore other methods if that one does not do any good. – Akash G Aug 18 '19 at 10:45
  • @AkashG Thanks for response! I addressed your answer, but it seems you misunderstood something. Moreover I have no access to environment to test your solution but it's pretty obvious to me it can't help here. – mr.nothing Aug 26 '19 at 15:53

2 Answers2

2

There's nothing in Hadoop code which truncates counter names after its initialization. So, as you've already pointed out, mapreduce.job.counters.counter.name.max controls counter's name max length (with 64 symbols as default value).

This limit is applied during calls to AbstractCounterGroup.addCounter/findCounter. Respective source code is the following:

@Override
public synchronized T addCounter(String counterName, String displayName,
                                 long value) {
  String saveName = Limits.filterCounterName(counterName);
  ...

and actually:

public static String filterName(String name, int maxLen) {
  return name.length() > maxLen ? name.substring(0, maxLen - 1) : name;
}

public static String filterCounterName(String name) {
  return filterName(name, getCounterNameMax());
}

As you can see, the name of the counter is being saved truncated with respect to mapreduce.job.counters.max. On its turn, there's only a single place in Hadoop code where call to Limits.init(Configuration conf) is performed (called from LocalContainerLauncher class):

class YarnChild {

  private static final Logger LOG = LoggerFactory.getLogger(YarnChild.class);

  static volatile TaskAttemptID taskid = null;

  public static void main(String[] args) throws Throwable {
    Thread.setDefaultUncaughtExceptionHandler(new YarnUncaughtExceptionHandler());
    LOG.debug("Child starting");

    final JobConf job = new JobConf(MRJobConfig.JOB_CONF_FILE);
    // Initing with our JobConf allows us to avoid loading confs twice
    Limits.init(job);

I believe you need to perform the following steps in order to fix counter names issue you observe:

  1. Adjust mapreduce.job.counters.counter.name.max config value
  2. Restart YARN/MapReduce service
  3. Re-run your job

You still will see truncated counter names for old jobs I think.

morsik
  • 1,250
  • 14
  • 17
  • Though I'm unable to check this it should be very helpful and explanatory for those who face this issue (according to upvotes there are a lot of such people) – mr.nothing Jul 16 '20 at 10:13
1

getName() seems to be deprecated

Alternatively, getUri() that comes with a default maximum length of 255 can be used.

Documentation link: getUri()

Have not tried it personally, but it seems to be a possible fix to this problem.

Akash G
  • 733
  • 4
  • 10
  • Not sure you get the issue correctly. You are talking about `org.apache.hadoop.fs.FileSystem#getName` but this topic is about `org.apache.hadoop.mapreduce.Counter#getName` and it's behavior. – mr.nothing Aug 26 '19 at 15:32