4

I'm running Hive + Tez on EMR and I'd like some clarity for how Tez interacts with YARN.

I read in this article:

Set tez.am.resource.memory.mb to be the same as yarn.scheduler.minimum-allocation-mb (the YARN minimum container size)

Set hive.tez.container.size to be the same as or a small multiple (1 or 2 times that) of YARN container size yarn.scheduler.minimum-allocation-mb but NEVER more than yarn.scheduler.maximum-allocation-mb. You want to have headroom for multiple containers to be spun up.

This makes it sound like the Tez containers are configured separately from YARN containers. Is that true? From the general documentation, it seems like Tez is a replacement for YARN containers, which would mean that you set the Tez container size and can ignore the original YARN container size.

In short: Do Tez containers run inside of YARN containers, or do Tez containers run instead of YARN containers?

Community
  • 1
  • 1
S.S.
  • 684
  • 8
  • 21
  • 2
    Tez works inside yarn cluster. Tez calculate required resources and "ask" it from yarn – User9123 Mar 09 '20 at 16:58
  • @User9123 so it runs inside the yarn cluster, but what about the yarn container? does the tez container replace the yarn container? – S.S. Mar 09 '20 at 17:24
  • 1
    No, tez containers runs inside yarn containers. Multiple uses of word "containers" might confuse. Yarn container is portion of resources (eg RAM), tez container is part of map or reduce task. If tez want to run some task then tez request some resources from yarn. – User9123 Mar 09 '20 at 17:37
  • 1
    Ok so in that case why don't you post that as the answer? – S.S. Mar 09 '20 at 18:26
  • 1
    @mazaneicha and User9123 it sounds like you both are saying different things. Is the Tez container simply a YARN container, or is running inside the YARN container? (and honestly, if you both believe you're saying the right answer, please post it as the answer instead of just commenting!) – S.S. Mar 10 '20 at 04:21

1 Answers1

0

tez-site.xml is separate, yes. Because without it, Tez wouldn't know how to run on its own.

Tez is more of a replacement for mapreduce, not YARN

If you run a Tez job it'll show up in the YARN UI.

A Tez container (part of tasks) are therefore allocated into a YARN Tez job (running a collection of YARN containers consisting of Tez containers)

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245