As @TinNguyen suggested, we can used grep
to check some information, like the "vcores" lines... Perhaps other readers can suggest other grep strategies. So, this answer is a Wiki to consolidate all suggestions
All parsing suggestions are parsing the myYarnLog.txt
file of the question,
yarn logs -applicationId application_1581298836342_95477 > myYarnLog.txt
Command and plugins suggestions
ag
. Key filter. Example: ag vcores myYarnLog.txt
.
grep
. Key filter. Example: grep -i vcores myYarnLog.txt
.
awk
. Turing-complete filter and formater.
Example: awk "/vcores/i {print $0}" myYarnLog.txt
lnav
, the "Log-file Navigator", http://lnav.org/features (git).
Accepts regex-filtering and others.
Key suggestions
Key-words to filter relevant information for performance analysis.
Standard log terms:
LogAggregationType. Log-file standard attribute. Example: AGGREGATED.
INFO CodeGenerator. Example: "Code generated in 381.632282 ms"
INFO MemoryStore.Example: "Block broadcast_13_piece0 stored as bytes in memory (estimated size 11.5 KB, free 37.2 GB)"
INFO TorrentBroadcast. Example: "Reading broadcast variable 13 took 91 ms"
...
Generic terms, used in some logs:
vcore. A term, virtual-cores, that can be used as unit. Examples: "4 vcores" or "5 seconds per vcore".
stored as bytes in memory. Example: a line with no tag, say "Block broadcast_13 stored as values in memory (estimated size 26.3 KB, free 37.2 GB)"
bytes result sent to driver. Is relevant?
...
Spark-specific key-words:
Filtering rules
... use of columns, composite filters, calculate totals, etc.
Example of awk rule: /LogAggregationType/ {print "log type: " $2}
.
Evidence-based tuning of config
In any evidence-based practice we need data to analyse and act... In this case the data of the log-file, to do good changes in the config-file.

See how to change config files on Yarn, Spark, etc.