How to parse Yarn logs to obtain performance indicators?

Question

Summary: I need something to yarn logs -applicationId myID | parse.

I am developing a code with some level of parallelism, so I need a feedback about vcores and RAM memory used... I have the application ID of each test, so, after run I can use

 yarn logs -applicationId application_1581298836342_95477 > myYarnLog.txt

but it is a big and complex log, and I need only to check vcores and memory used.

There are a parse for myYarnLog.txt, that filter or calculate the performance indicators?

PS: "RAM memory" can be "Aggregate Resource Allocation", vcores can be some virtual-CPU allocation statistics, etc.

You can filter your text file: https://www.tecmint.com/linux-file-operations-commands/ — Vincenzo Ninni, Mar 03 '20 at 13:46
Before writing it to file you can pipe into grep `| grep vcores` to get only the lines containing `vcores`. — Tin Nguyen, Mar 03 '20 at 15:19
Hi @TinNguyen and VincenzoNinni, see [my Wiki-answer](https://stackoverflow.com/a/60514844/287948), you can edit there. — Peter Krauss, Mar 03 '20 at 20:46

score 1 · Answer 1 · answered Mar 03 '20 at 16:33

1

in you yarn site.xml similar to, like here

 yarn.resourcemanager.scheduler.monitor.enable

you need silimar in file capacity-scheduler.xml as response here

<property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>

answered Mar 03 '20 at 16:33

Soleil

381
4
9

Hi @Soleil, thanks the clues about the configuration. My need (the scope of the question) is about *"how to read the log-file"* of an executed script, **not** about the configuration. I need the fact, not the plan: the log-file say how many vcores, hours of vcores and RAM did the process really used. – Peter Krauss Mar 03 '20 at 20:09
1

sorry, maybe you need install a tool like http://lnav.org/features – Soleil Mar 03 '20 at 23:04

Peter Krauss · Answer 2 · 2020-03-04T00:08:32.970

As @TinNguyen suggested, we can used grep to check some information, like the "vcores" lines... Perhaps other readers can suggest other grep strategies. So, this answer is a Wiki to consolidate all suggestions

All parsing suggestions are parsing the myYarnLog.txt file of the question,

 yarn logs -applicationId application_1581298836342_95477 > myYarnLog.txt

Command and plugins suggestions

ag. Key filter. Example: ag vcores myYarnLog.txt.
grep. Key filter. Example: grep -i vcores myYarnLog.txt.
awk. Turing-complete filter and formater.
Example: awk "/vcores/i {print $0}" myYarnLog.txt
lnav, the "Log-file Navigator", http://lnav.org/features (git).
Accepts regex-filtering and others.

Key suggestions

Key-words to filter relevant information for performance analysis.

Standard log terms:

LogAggregationType. Log-file standard attribute. Example: AGGREGATED.
INFO CodeGenerator. Example: "Code generated in 381.632282 ms"
INFO MemoryStore.Example: "Block broadcast_13_piece0 stored as bytes in memory (estimated size 11.5 KB, free 37.2 GB)"
INFO TorrentBroadcast. Example: "Reading broadcast variable 13 took 91 ms"
...

Generic terms, used in some logs:

vcore. A term, virtual-cores, that can be used as unit. Examples: "4 vcores" or "5 seconds per vcore".
stored as bytes in memory. Example: a line with no tag, say "Block broadcast_13 stored as values in memory (estimated size 26.3 KB, free 37.2 GB)"
bytes result sent to driver. Is relevant?
...

Spark-specific key-words:

ShuffleBlockFetcherIterator. Lines with started/getting times and blocks, useful for awk summarizations.
...

Filtering rules

... use of columns, composite filters, calculate totals, etc.

Example of awk rule: /LogAggregationType/ {print "log type: " $2}.

Evidence-based tuning of config

In any evidence-based practice we need data to analyse and act... In this case the data of the log-file, to do good changes in the config-file.

See how to change config files on Yarn, Spark, etc.

How to parse Yarn logs to obtain performance indicators?

2 Answers2

Command and plugins suggestions

Key suggestions

Filtering rules

Evidence-based tuning of config