5

Summary: I need something to yarn logs -applicationId myID | parse.

I am developing a code with some level of parallelism, so I need a feedback about vcores and RAM memory used... I have the application ID of each test, so, after run I can use

 yarn logs -applicationId application_1581298836342_95477 > myYarnLog.txt

but it is a big and complex log, and I need only to check vcores and memory used.

There are a parse for myYarnLog.txt, that filter or calculate the performance indicators?


PS: "RAM memory" can be "Aggregate Resource Allocation", vcores can be some virtual-CPU allocation statistics, etc.

Peter Krauss
  • 13,174
  • 24
  • 167
  • 304

2 Answers2

1

in you yarn site.xml similar to, like here

 yarn.resourcemanager.scheduler.monitor.enable

you need silimar in file capacity-scheduler.xml as response here

<property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>
Soleil
  • 381
  • 4
  • 9
  • Hi @Soleil, thanks the clues about the configuration. My need (the scope of the question) is about *"how to read the log-file"* of an executed script, **not** about the configuration. I need the fact, not the plan: the log-file say how many vcores, hours of vcores and RAM did the process really used. – Peter Krauss Mar 03 '20 at 20:09
  • 1
    sorry, maybe you need install a tool like http://lnav.org/features – Soleil Mar 03 '20 at 23:04
0

As @TinNguyen suggested, we can used grep to check some information, like the "vcores" lines... Perhaps other readers can suggest other grep strategies. So, this answer is a Wiki to consolidate all suggestions


All parsing suggestions are parsing the myYarnLog.txt file of the question,

 yarn logs -applicationId application_1581298836342_95477 > myYarnLog.txt

Command and plugins suggestions

  • ag. Key filter. Example: ag vcores myYarnLog.txt.
  • grep. Key filter. Example: grep -i vcores myYarnLog.txt.
  • awk. Turing-complete filter and formater.
    Example: awk "/vcores/i {print $0}" myYarnLog.txt
  • lnav, the "Log-file Navigator", http://lnav.org/features (git).
    Accepts regex-filtering and others.

Key suggestions

Key-words to filter relevant information for performance analysis.

Standard log terms:

  • LogAggregationType. Log-file standard attribute. Example: AGGREGATED.

  • INFO CodeGenerator. Example: "Code generated in 381.632282 ms"

  • INFO MemoryStore.Example: "Block broadcast_13_piece0 stored as bytes in memory (estimated size 11.5 KB, free 37.2 GB)"

  • INFO TorrentBroadcast. Example: "Reading broadcast variable 13 took 91 ms"

  • ...

Generic terms, used in some logs:

  • vcore. A term, virtual-cores, that can be used as unit. Examples: "4 vcores" or "5 seconds per vcore".

  • stored as bytes in memory. Example: a line with no tag, say "Block broadcast_13 stored as values in memory (estimated size 26.3 KB, free 37.2 GB)"

  • bytes result sent to driver. Is relevant?

  • ...

Spark-specific key-words:

  • ShuffleBlockFetcherIterator. Lines with started/getting times and blocks, useful for awk summarizations.

  • ...

Filtering rules

... use of columns, composite filters, calculate totals, etc.

Example of awk rule: /LogAggregationType/ {print "log type: " $2}.


Evidence-based tuning of config

In any evidence-based practice we need data to analyse and act... In this case the data of the log-file, to do good changes in the config-file.

evidence-based cycle

See how to change config files on Yarn, Spark, etc.

Peter Krauss
  • 13,174
  • 24
  • 167
  • 304