0

I tried the below code, but it is very slow, Looking for any optimal way to caluclate the uncompressed data size. I read somewhere using zstream structure -> inflate api we can speedup the performance. could you please help me to get it done.

 /* skip regfile */
int move_file_pointer(TAR *t)
{

    int i, k;
    size_t size;
    char buf[T_BLOCKSIZE];
    if (!TH_ISREG(t))
    {
        errno = EINVAL;
        return -1;
    }
    size = th_get_size(t);
    std::printf("\n Current File  Position - %lu", gztell(l_gzFile));
    long int luOffset= 0;
    int reminder =  size % T_BLOCKSIZE ;
    if(reminder == 0 )
    {
        luOffset = (size/T_BLOCKSIZE) * T_BLOCKSIZE;
    }
    else
    {
        luOffset = ((size/T_BLOCKSIZE) * T_BLOCKSIZE) + T_BLOCKSIZE ;
    }
    std::printf("\n Targeted Seek offset value %lu", luOffset);
    std::printf("\n Targeted Seek offset value %lu", luOffset);

    /*  
    ######################################################
    If you look here gzseek is taking long time to complete. for small zip file such as 500 MB, it consuming 2 mins to move end of file.
    I have TAR *t datatype here.
    gzFile l_gzFile file pointer here.
    Using these two data types.
    How can i seek to end in fast / optimal way.
    ##################################################### */    
    k = gzseek(l_gzFile, luOffset , SEEK_CUR); // l_gzFile is from gzFile data type.


    if (k == -1)
    {
        if (k != -1)
            errno = EINVAL;
    return -1;
    }
    std::printf("\n After Read Block - %lu", gztell(l_gzFile));
    return 0;
}


void getUnTarFileSize(std::string f_cSourePath)
{
    std::string dest = "/fs/usb0/untar_zlib/test/";
    TAR *l_pTarInfo = NULL;
    char *l_pcTarFileSourcePath = const_cast<char * >(f_cSourePath.c_str());
    char *l_pcTarFileDestPath =  const_cast<char * >(dest.c_str());
    //open tar archive
    if (0 != (tar_open(&l_pTarInfo, l_pcTarFileSourcePath, &gztype,  O_RDONLY, 0, TAR_GNU)))
    {
        std::printf("tar_open(): %s \n", std::strerror(errno));
    }
    else
    {
        int i = 0;
        unsigned long totalSize = 0;
        unsigned long current_size = 0;
        std::printf("\n Current File  Position - %lu \n", gztell(l_gzFile));
        while ((i = th_read(l_pTarInfo)) == 0)
        {
            char *fName = th_get_pathname(l_pTarInfo);
            current_size = th_get_size(l_pTarInfo);

            printf("\n Size of fName %s = %d", fName,current_size);
            totalSize += current_size;

            if (TH_ISREG(l_pTarInfo) && (move_file_pointer(l_pTarInfo) != 0)) {
                  fprintf(stderr, "tar_skip_regfile()\n");
                  printf("\n Value of read=%d, Error=%s\n",i,std::strerror(errno));
                  break;
            }
            fName = NULL;
            printf("\n\n");
        }
        if(-1 == i)
        {
            printf("\n Value of read=%d, Error=%s\n",i,std::strerror(errno));
        }
        else
        {
            printf("\n Total Size of given zip file=%d\n",totalSize);
        }
    }
}

int main()
{
    getUnTarFileSize("/fs/usb0/untar_zlib/a.tar.gz");
}
Vishnu
  • 3
  • 1
  • 4

1 Answers1

0

If you are asking how to know the uncompressed size of the contents of a gzip (.gz) file, then the only reliable way is to decompress it. See this answer here for more details.

Community
  • 1
  • 1
Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • Thanks Mark. Looks like, the given answer is in Java language. " I am looking for the same in C++" .. zstream structure -> inflate api supposed to be invloved in this. – Vishnu Aug 17 '15 at 14:04
  • adding one more point -> I am using Zlib library in my C++ project. – Vishnu Aug 17 '15 at 14:11
  • The linked answer is independent of the language, and there is no Java or any other code in that answer. To use zlib, read the documentation in [zlib.h](http://zlib.net/manual.html), and having done that, you can look at [the example of zlib use](http://zlib.net/zlib_how.html). – Mark Adler Aug 17 '15 at 14:43
  • Hi Mark, Sorry for asking this basic quesiton. yes. i saw the zlib webpage before posting my quesiton. I am not able to undserstand the example. especilly zstream object. I have modified the question with detailed code. – Vishnu Aug 17 '15 at 16:28
  • /* ###################################################### If you look here gzseek is taking long time to complete. for small zip file such as 500 MB, it consuming 2 mins to move end of file. I have TAR *t datatype here. gzFile l_gzFile file pointer here. Using these two data types. How can i seek to end in fast / optimal way. ##################################################### */ k = gzseek(l_gzFile, luOffset , SEEK_CUR); // l_gzFile is from gzFile data type. – Vishnu Aug 17 '15 at 16:31
  • That is exactly what you need to do. It will take time to scan the entire file. I don't know what kind of machine or mass storage device you have. However two minutes seems unusually slow for a 500 MB file. On my 2 GHz i7 with a solid state drive, I get 66 MB/s on a large gzip file, so 500 MB would be 7.5 seconds. – Mark Adler Aug 17 '15 at 16:48
  • my mistake, I missed to mention. what is my system envrionment. QNX OS and 2032MB Ram. – Vishnu Aug 17 '15 at 18:11