1

I have thousands of uncompressed tar files, which add up to almost 1 TB of data. Now i want to modify one specific string within some of the tar files.

Can I do this directly without extracting the tar file, e.g. with sed? Of course I do not want to get corrupted tar files.


Details:

The string I want to modify is 4 characters long. Can I replace it with another 4 character string? The specification of a tar file seems to contain a checksum only for the header of each file contained in the tar file, and the length of each file, so replacing 4 characters within a file contained in the tar file with other 4 characters should be fine, right?

Can I even replace the 4 character string with a 6 character string? There seems to be some padding within tar files, so what is the probability of getting a corrupt tar file when adding 2 characters? How well does extraction still work for such a corrupted tar file?

I do not have nested tar files, so this negative answer does not apply to my situation.

Community
  • 1
  • 1
DaveFar
  • 7,078
  • 4
  • 50
  • 90
  • Extract the desired file from tar, modify it with sed, delete file from tar file and add modified file to tar file? – Cyrus Jul 04 '15 at 11:38
  • Yes, valid comment, Cyrus. But for 1TB of data, I think it's also valid to think about an optimized approach - hence the question. – DaveFar Jul 04 '15 at 13:34

2 Answers2

2

That is several questions:

  • can you modify the content of files in a tar file (probably, since the file content has no checksum)
  • what tool is useful (the documentation for sed is unclear, but others as in binary sed replacement say "no", and suggest alternatives)
  • can you replace a 4-character string with a 6-character string (probably not, since that changes the file-length, which requires adjusting the header and its checksum).
Community
  • 1
  • 1
Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105
  • You are right, I formulated several questions. Your first bullet is by far the most relevant - a "probably no corruption" sounds kind of risky since I have a huge amount of tar files with important data. – DaveFar Jul 04 '15 at 13:30
  • Your second bullet was not intended as question: I know I can use sed for the replacements, I just mentioned it as an example of direct modification - as opposed to untar then modify then tar. – DaveFar Jul 04 '15 at 13:32
  • 1
    For what it's worth, a *quick check* seems to show that simple replacement by sed will work. But it would be nice to have its behavior on binary files better-documented. – Thomas Dickey Jul 04 '15 at 13:55
1

I had to work with raw tar files some years ago, and it's nothing I could recommend. There are too many "tar" formats to make sure your substitution was doing what you wanted it to do, and only what you wanted it to do.

In my case I had no choice and I had to use emacs to edit the tar file. It was someone's backup.

If I had your problem, I'd take the time to write a small script/program to extract and sed(1) things to avoid a possible (maybe unlikely) corrupt archive.

Also, you probably have to run as root to ensure correct permissions and timestamps.

Erik Bennett
  • 1,049
  • 6
  • 15