20

I read that a tar entry type of 'L' (76) is used by gnu tar and gnu-compliant tar utilities to indicate that the next entry in the archive has a "long" name. In this case the header block with the entry type of 'L' usually encodes the name ././@LongLink .

My question is: where is the format of the next block described?

The format of a tar archive is very simple: it is just a series of 512-byte blocks. In the normal case, each file in a tar archive is represented as a series of blocks. The first block is a header block, containing the file name, entry type, modified time, and other metadata. Then the raw file data follows, using as many 512-byte blocks as required. Then the next entry.

If the filename is longer than will fit in the space allocated in the header block, gnu tar apparently uses what's known as "the ././@LongLink trick". I can't find a precise description for it.

When the entry type is 'L', how do I know how long the "long" filename is? Is the long name limited to 512 bytes, in other words, whatever fits in one block?

Most importantly: where is this documented?

Cheeso
  • 189,189
  • 101
  • 473
  • 713
  • 8
    Someone voted to close this as not programming related. Actually, it is programming related, because I'm trying to build a tar in .NET that supports GNU's LongName trick. If I just needed to extract an archive, I could of course just use gnu's tar, and I wouldn't need the answer to this question. – Cheeso Jan 16 '10 at 22:32

2 Answers2

15

Just by observation of a single archive here's what I surmised about the 'L' entry type in tar archives, and the "././@LongLink" name:

The 'L' entry is present in a header for a series of 1 or more 512-byte blocks that hold just the filename for a file or directory with a name over 100 chars. For example, if the filename is 1200 chars long, then the size in the header block will be 1200, and there will be 3 additional blocks with filename data; the last block is partially filled.

Following that series is another header block, in the traditional form - a header with type '0' (regular file) or '5' (directory), followed by the appropriate number of data blocks with the entry data. In the header for this series, the name will be truncated to the first 100 characters of the actual name.

EDIT
See my implementation here: http://cheesoexamples.codeplex.com/SourceControl/changeset/view/99885#1868643

Cheeso
  • 189,189
  • 101
  • 473
  • 713
  • 1
    I also found this page that talks more about "LongLink": http://www.delorie.com/gnu/docs/tar/tar_114.html You may also run into "@@MaNgLeD.*" headers too. These are similar to LongLink, but the data block(s) contains a script to rename a file from "@MaNgLeD.___" to it's real path name. They can be handled very similar to the LongLink when you are processing a TAR yourself. – Daryl Hanson Apr 18 '12 at 16:23
  • 1
    From what I can see, the size is strlen() + 1, so in your example the size would be 1201 in the tar file. Although it may not matter if you do not save the NUL terminator except if your filename is an exact multiple of 512 bytes. – Alexis Wilke Jun 20 '13 at 01:06
  • @AlexisWilke - it's been several years, but as I recall the nul char was not encoded. you may be right though. – Cheeso Jun 20 '13 at 01:40
  • See [this useful man page](http://manpages.ubuntu.com/manpages/intrepid/man5/star.5.html): ’L’ A long file name. Star is able to read and write this type of header. With the xustar and exustar formats, star prefers to store long file names using the POSIX.1-2001 method. The size is always non zero and denotes the length of the long file name including the trailing null byte. The file name is in the data that follows the header. – xroche Aug 28 '14 at 13:52
4

Note that the information about all of that can be found in the libtar project:

http://www.feep.net/libtar/

The proposed header is libtar.h (opposed to the POSIX tar.h) which clearly includes a long filename, and long symbolic link.

Get the "fake" headers + data for the long filenames/links then the "real" header (except for the actual filename and symbolic link) after that.

HEADER type 'L'
BLOCKS of data with the real long filename
HEADER type 'K'
BLOCKS of data with the real symbolic link
HEADER type '0' (or '5' for directory, etc.)
BLOCKS of data with the actual file contents

Of course, under MS-Windows, you probably won't handle symbolic links, although with Win7 it is said that symbolic links under MS-Windows are working (finally—this is now official in Win10!)

Pertinent definition from libtar.h:

/* GNU extensions for typeflag */
#define GNU_LONGNAME_TYPE   'L'
#define GNU_LONGLINK_TYPE   'K'
Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156