0

I would like to create a simple yaml workflow that works as metadata in a yaml environment as below. The user will create these and submit them, mostly to organize a modest number of tasks (such as specifying a chain of anomaly detectors). Imports will be parsed with importlib. I was planning to use newglobals=None and populate newlocals using the imports and arguments, then call eval(globals=newglobals,locals=newlocals). The workflow yaml would orchestrate work and create metadata in yaml which suits our needs and it is also easy to extend to non-python shell scripts.

My question concerns the use of eval. It isn't hard to find examples online of how malicious arbitrary code could be represented and run with yaml, e.g. with module=shutil, names='remove', expr='remove' and args = '/'.

However, the text is potentially non-arbitrary if the user is uses this workflow tool to organize their own work and stores the yaml in trusted repos. Is there an incremental danger to the yaml/eval approach compared to python if the python and yaml/eval are both managed using the same type of security? After all, I expect our organization members not to execute a file that says run os.shutil.remove('/'). Are there additional dangers?

imports:
    - module: mymod
      names:
          - func1
steps:
    - expr: 'func1(foo=foo) + 2'
      args:
          foo: 2
      
Eli S
  • 1,379
  • 4
  • 14
  • 35
  • 1
    https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice – TheTridentGuy supports Ukraine Jul 11 '23 at 14:24
  • 1
    I'm not sure what your question is. You understand the risks. – JonSG Jul 11 '23 at 15:11
  • @TheTridentGuy In that link, the ordering of the tasks is known and the use of eval is completely avoidable with regular python. The responses do not address the "additional dangers" part of my question. I know that `eval` is hazardous, particularly with arbitrary text. I'm not looking for a mantra, but rather the details. – Eli S Jul 12 '23 at 15:25
  • @JonSG. The key is in the 'additional dangers' component of my question. There is plenty on SO to speak of the dangers of `eval` on arbitrarily delivered text and I hope my question does not get hijacked by that. But my users could execute arbitrary *.py files that delete or hack their system. The fact they do not has to do with using trusted text in the form of py files. Unvetted pull requests could cause us all to lose everything. My question is whether managed yaml + eval solution introduces something beyond managed python (the hacking examples I've seen are amazing and hard to anticipate). – Eli S Jul 12 '23 at 15:32
  • 3
    It’s fairly simple: do you trust your users enough to let them execute arbitrary code, and are you sure that an attacker cannot impersonate a legitimate user and give you “bad” code? – deceze Jul 12 '23 at 15:41
  • @deceze If it is that simple, then this is a point not well made on SO. You can find 'arbitrary code is unsafe' but not the other way around. I have re-written the question to establish an assumption that makes the question concrete: whatever security exists around the yaml will be the same as around the py code that interprets the yaml. – Eli S Jul 12 '23 at 15:44
  • 1
    Executing arbitrary code is unsafe. … Unless you trust it. But you need to be sure in that trust. And that there’s no other way some code may be passed to your `eval` call without going through that trust chain. Which is usually where it becomes hairy. – deceze Jul 12 '23 at 15:47
  • 1
    Yes it is more dangerous in my opinion. At some level py files tend to be viewed as executables and are less trusted, yamal files are data and implicitly garner more trust. If some one say to a typical user run this trusted program on some random file they probably will. This is a clear attack vector in the same style as excel macros in random excel data in my opinion. – JonSG Jul 12 '23 at 18:54
  • 1
    @JonSG I plus-oned that point. Not quite programming, but psychology is definitely part of the issue. – Eli S Jul 20 '23 at 19:21

1 Answers1

4

Let's put it this way:

If you have absolutely no eval or eval-like call in your program, it should be entirely predictable in terms of what it can do and what it won't do. If there's no call to any function that deletes files in your codebase, then you can be reasonably sure that your program won't ever be deleting any files, for instance.

I say "eval-like", because this also includes things like incorrectly concatenated SQL queries (SQL-injection), HTML-injection, shell command injection, even calling functions by name where the name value isn't being reasonably whitelisted. All these are things where the behaviour of your program is determined by runtime information, and thus becomes less predictable or unpredictable.

There are of course ways to use these things reasonably safe, by validating/limiting the possible values strings can take and/or escaping them correctly and/or passing them safely when using them in SQL/HTML/shell calls/functions selection etc.

A pure eval call can hardly reasonably be clamped down, as it allows for a wide variety of arbitrary expressions, purposefully so. If you use eval, you're very explicitly allowing any and all possible code to be executed. Some light limiting of globals etc. has time and again been shown to be ineffective. So it all comes down to whether you trust the code that you're evaling. And how you can trust the process by which you trust the code. Can you be sure that there's no way a malicious user can trigger the code path that leads to the execution of eval with some malicious payload? How can you be sure of this?

It's not impossible to architect a solution to this. But that's a lot more complex then not having eval in your code at all. And you'll have to maintain that trust solution over time. Systems tend to "soften" over time, spawning more and more features, allowing more and more access due to various reasons. That's where it gets hairy when including an eval in your program.

deceze
  • 510,633
  • 85
  • 743
  • 889
  • I am going to accept this answer. I think these are all good points, but in the end I feel a backslide to the arbitrary code discussion and that subject has been covered and I agree. My own use case would be one where the eval'd code would be in a repo. I principle it is the same as the py files, representing slightly different workflows. @JonSG had a good point about mindset, and I will definitely do what I can to restrict locals and globals. – Eli S Jul 20 '23 at 19:26