6

I want to get all the installed packages licenses on my Ubuntu server, I can dump it all by using (this 2013 post):

packages=$( dpkg --get-selections | awk '{ print $1 }' )
for package in $packages; do
  echo "$package: "
  cat /usr/share/doc/$package/copyright
  echo; echo
done > /tmp/licenses.txt
less /tmp/licenses.txt

But the output is a huge useless file with all the copyright data for each package. I need something like:

package: package_name        licence: licence_name

Is there a parser or some other tool to get data like this?

sondra.kinsey
  • 583
  • 7
  • 18
  • This is a duplicate of the linked https://askubuntu.com/q/247757 whose answers include the very helpful https://github.com/daald/dpkg-licenses – sondra.kinsey Jul 08 '19 at 14:11

1 Answers1

4

What you are trying is poorly supported at the moment, though there is an effort under way to provide machine-readable information in the file /usr/share/doc/*/copyright files. See for example this excerpt:

Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Upstream-Name: at
Source: git://anonscm.debian.org/collab-maint/at.git
Comment: This package was debianized by its author Thomas Koenig
 <ig25@rz.uni-karlsruhe.de>, taken over and re-packaged first by Martin
 Schulze <joey@debian.org> and then by Siggy Brentrup <bsb@winnegan.de>,
 and then taken over by Ryan Murray <rmurray@debian.org>.
 .
 In August 2009 the upstream development and Debian packaging were taken over
 by Ansgar Burchardt <ansgar@debian.org> and Cyril Brulebois <kibi@debian.org>.
 .
 This may be considered the experimental upstream source, and since there
 doesn't seem to be any other upstream source, the only upstream source.

Files: *
Copyright: 1993-1997,  Thomas Koenig <ig25@rz.uni-karlsruhe.de>
           1993,       David Parsons
           2002, 2005, Ryan Murray <rmurray@debian.org>
License: GPL-2+

Files: getloadavg.c
Copyright: 1985-1995, Free Software Foundation Inc
License: GPL-2+

Files: posixtm.*
Copyright: 1989-2007, Free Software Foundation Inc
License: GPL-3+

Files: parsetime.pl
Copyright: 2009, Ansgar Burchardt <ansgar@debian.org>
License: ISC 

License: GPL-2+
 This program is free software; you can redistribute it
 and/or modify it under the terms of the GNU General Public
 License as published by the Free Software Foundation; either
 version 2 of the License, or (at your option) any later
 version.

See the specification (linked above) in http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ for details.

As you can see, the basic assumption that there is necessarily a single license per package is false. There may be multiple licenses per file -- depending on which problem you are trying to solve, it may of course be possible to ignore many of them (for example, if you want to investigate whether or not you have stuff under the Apache license, that should be easy to do, for packages which have transitioned to this new format).

This is new with Debian Jessie, released in 2015; older versions of Debian do not have anything like this. The best you can do if you need to audit a system with older packages is probably to grep the copyright files for fragments which look like GPL, BSD, MIT etc and then hope you're not missing too much; but hope on top of some flimsy grepping seem anathema to any proper legal work, which I think we can assume is the reason you are attempting this. A better approach might be to find the current copyright files for the packages you are auditing, with the roughly machine-readable information, and hoping (there's that word again) that they are adequate for the older version you have installed, too.

(For comparison, older versions, too, are available at http://metadata.ftp-master.debian.org/changelogs/main/a/at/ for you to examine.)

I don't follow Ubuntu very closely any longer, but assume they are picking up this change since a few versions back. Indeed, http://packages.ubuntu.com/xenial/at seems to have the same copyright file.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • By informal review, a Jessie system I have access to returns 248 for `grep -l '^Format:' /usr/share/doc/*/copyright | wc -l` and 569 for `grep -L`. So less than 1/3 of the packages we have installed have transitioned to the new format. If you have many legacy packages, the number is likely to be lower. – tripleee Jan 28 '16 at 04:04
  • Well, the requested data is for legal work, and it is very disappointing that after 3 years it is still an issue. Thank you for the comprehensive answer. –  Jan 28 '16 at 06:06