Extracting shell scripts from an RPM spec file for static analysis

Question

I want to run ShellCheck on scripts embedded within a spec file that will run where the RPM is deployed. I have .spec snippets like,

%setup -q
cat > ./example.sh << EOF
#!/bin/sh
echo "example"
EOF

As well as hooks,

%post
#!/bin/sh
echo "Hello"

Is there some way to programmatically extract these shell snippets to run a script analysis tool like ShellCheck? Like maybe rpmbuild --save-temps or some concept like this? Or does every script need to be bound by known text so I can use a stream tool (grep, awk, sed, etc.)?

I have a large amount of spec files which I would prefer not to modify. For example to check scripts for security related items, etc. without needing to parse a spec file. Searching for bison + spec gives the wrong concept and I think you need to parse the RPM macros and a lot of other machinery; or maybe the grammar is simpler than I think?

score 1 · Answer 1 · answered Apr 21 '15 at 03:53

1

I've been thinking about doing this for some of my RPMs recently as well.

You can get the %prep, '%build', %install, etc. sections with python from the spec file itself.

CentOS 5 code:

import rpm

ts = rpm.ts()

spec = ts.parseSpec("package.spec")

for section in ['build', 'clean', 'install', 'prep']:
    try:
        print '%s' % (getattr(s, section,)())
    except:
        pass

CentOS 6 code:

import rpm

spec = rpm.spec('package.spec')

for section in ['build', 'clean', 'install', 'prep']:
    if hasattr(spec, section):
        print '%s' % (getattr(spec, section),)

There doesn't seem to be a way (in CentOS 5 or 6) to get the contents of the pre/post/etc. scriptlets via python though.

So you probably need to just get them out of the built RPM with rpm -qp --scripts and then split that output up into temp files and run shellcheck on them.

answered Apr 21 '15 at 03:53

Etan Reisner

77,877
8
106
148

2

The sections in the spec file are scriptlets, not scripts. The difference is that there are pre/post sections appended (with macro expansion) before feeding to a shell. The easiest way to save the actual scripts would be to wrap the /bin/sh invocation using a macro. – Jeff Johnson Apr 21 '15 at 05:39
However shellcheck isn't going to help too much improving scriptlets imho. Most build/install scriptlets are rather too simple to get wrong. A syntax check (which is implemented in rpm5) helps prevent some packaging errors. But security issues with (say) spaces in file paths that shellcheck is good at finding cannot identify serious flaws like "rm -rf /" in %post because the command is syntactically well formed. – Jeff Johnson Apr 21 '15 at 05:54
1

@JeffJohnson True, the macro definitions being missing is potentially a problem here. Is there a way to dump the resulting script in a reliable fashion? Running the build without `--clean` and with a custom `_tmppath` maybe? And I also agree that it isn't going to catch much but anything it does catch is some help at least. I should probably mention `rpmlint` in the answer also though. Hm... actually I wonder if I can get `rpmlint` to run the scriptlets through shellcheck. – Etan Reisner Apr 21 '15 at 11:33
This is a good suggestion. Also, I don't intend to use *shellcheck* for security. Shellscript was a concrete example; mainly the question is 'how to extract the scripts'? You could teach shellscript more security features or write your own. It is a good point that pre/post should be simple and most RPMs don't have them so maybe a visual audit is easiest. Also, the `%{rpm_var}` variables could be treated like an unknown `${shell_var}` by any analysis tools or some external function. The syntax is different, so a special case is needed so that is a good point. – artless noise Apr 21 '15 at 14:01
@JeffJohnson I see my question was not clear. I trust the building part (or at least that is a separate issue for me). I wanted to inspect the RPM for scripts to be run on the final deployment machines. I see you think this could be to check the code that rpmbuild, etc will execute on the build host. I didn't mean that and I hope I clarified the questions. – artless noise Apr 21 '15 at 14:48
I think `rpm -qp --scripts` (stripping 'postscript', etc) and `rpm2cpio | cpio -idv --no-absolute-filenames` with a `find cpio_dir -type f -exec sh -c "file {} | grep -q shell" \; -print` is my answer (at least functional for me so far). It also works for those of us stuck with older RPM versions (even though we would like to use newer ones). – artless noise Apr 21 '15 at 14:55
Don't expand `{}` into a script directly (it isn't safe). Pass it as an argument to the script and use positional parameters. – Etan Reisner Apr 21 '15 at 15:17
Everything that "rpm -q --scripts" displays is available through python, but using the binary packages, not the source package. rpmlint (or python) can do macro expansions to convert scriptlets -> scripts. meanwhile this is tempting, you are better off vetting the template once rather than every package expansion. You can visually examine every script at any time. – Jeff Johnson Apr 21 '15 at 15:41
The trickiest part of an audit is understanding the context in which a script runs. Each of %pre/%post/%preun/%postun runs in a different context, and the context of --triggers scripts has an even more complicated context. Package scripts are supposed to be idempotent and often aren't, particularly with 3rd party packaging. – Jeff Johnson Apr 21 '15 at 15:45
@JeffJohnson We have ranged quite far afield from the original answer but I can't help but ask is there a reason the `%pre`/etc. scriptlets aren't available from the spec file? Clearly they wouldn't be under the toplevel object but they could be under the `packages` entries. And yes, clearly this idea isn't going to replace manual auditing but it will prevent casual errors from slipping through (especially from, shall we say, less careful developers). – Etan Reisner Apr 21 '15 at 15:49
@etan: teaching rpmlint (or better, rpm itself) to run scripts with /bin/sh -n catches syntax errors in %preun/%postun that stop upgrades. But examine the diversity of tests in %preun testing $1 to disambiguate upgrade from erase: as long as packagers lack discipline/experience, then an audit will document the diversity without fixing much at all. – Jeff Johnson Apr 21 '15 at 15:50
@JeffJohnson `sh -n` is good (and yes should be automatic) but can't catch things like missing spaces around `[` in tests though which, while they may not prevent the upgrade is still an error that is worth catching. I'm all for teaching as many things to do as much as possible to make this stuff harder to screw up. – Etan Reisner Apr 21 '15 at 15:54
scriptlets actually are available by querying the spec file. the problem is that there may be multiple sub packages each with its own script lets. in python, you need to query against the pkgs list, not the spec file. but rpm was designed as an installer, not a text processor. and the ultimate reason is that the additions to the python bindings were designed to get someone a job @redhat, and were never very well designed or thoughtful or useful. – Jeff Johnson Apr 21 '15 at 15:57
@JeffJohnson Huh, so they are. I would have sworn I didn't see them in there. They certainly aren't as easy to get as the spec scripts but you can do it. Thanks for pointing that out. Though, hm, I don't see the trigger scriptlets actually. (This is all CentOS 5 with backported rpm patches to get rpm.spec in the first place so maybe I just missed that patch.) – Etan Reisner Apr 21 '15 at 16:07
Yes these are recent (like 3+y ago) additions to rpm-python. Triggers are even more complicated because all triggers are stored in a single tag set and you need to retrieve the flags to identify which trigger is which. But the triggers are there too, just not the pythonic programmer experience. – Jeff Johnson Apr 21 '15 at 16:34
@JeffJohnson I'll have to try again with CentOS 6 and see if they are there (also patched but less heavily) and then go see what patches I'm missing. Thanks for the TODO list item. =) And yeah, composing the lists is annoying but not difficult (and I don't even like python). – Etan Reisner Apr 21 '15 at 16:38
1

hint: the easiest extraction is the equivalent of "rpm -q --xml" that extracts everything that is present in source/binary headers that is true WYSIWYG. then design whatever API you want in python using dicts or convenient data structures. you want to extract using the bindings for headerSprintf (which is all that is being done with --xml when the popt alias is expanded) – Jeff Johnson Apr 21 '15 at 16:39

Extracting shell scripts from an RPM spec file for static analysis

1 Answers1