2

I have Perl XS code which calls a function from an external C library which returns char ** (array of strings).

The XS code will eventually return back to Perl an array ref with all the string results in there. Or undef on failure.

I have 2 problems:

  1. On program exit I get a core dump with messages about memory corruption, double free etc. (e.g. double free or corruption (fasttop)).
  2. How to return an undef value from XS sub denoting that something went wrong (not an empty array)?

Additionally, if anyone can confirm that I am handling correctly the cases where strings from Perl into the C function are utf8-encoded (e.g. the input filename) or the results back from the C function (which may contain utf8 strings) are sent back to Perl OK.

Here's my code (which is modelled after https://stackoverflow.com/a/46719397/385390 If I got that correctly, example #1):

AV *
decode(infilename_SV)
    SV *infilename_SV
  PREINIT:
    char *infilename;
    STRLEN infilename_len;
    char **results;
    size_t results_sz;
    char *aresult;
    size_t I;
    SV **aresultPP;
    char *dummy;
    STRLEN dummy_len;
  CODE:
    infilename = SvPVbyte(infilename_SV, infilename_len)
    // call C function
    results = myfunc(infilename, &results_sz);
    if( results == NULL ){
      printf("error!");
      // HOW TO return undef (and not an empty array?)
    }
    // create a Perl array to be returned
    RETVAL = (AV*)sv_2mortal((SV*)newAV());
    for(I=0;I<results_sz;I++){
      results_sz = strlen(results[I]);
      // create a new Perl string and copy this result
      aresult = newSVpv(results[I], 0);
      av_push(RETVAL, aresult);
      // free results as returned by C call
      free(results[I]);
    }
    // free results as returned by C call
    free(results);
    // debug print results
    for(I=0;I<results_sz;I++){
      aresultPP = av_fetch((AV *)RETVAL, I, 0);
      dummy = SvPVbyte(*apayloadPP, dummy_len);
      printf("result: %s\n", dummy);
    }
  OUTPUT:
     RETVAL
bliako
  • 977
  • 1
  • 5
  • 16
  • 2
    results_sz - I am assuming based on usage your myfunc populates it with the size of the results. You then seem to modify it inside the loop where you are it as a max for the loop. I assume you meant to use it to pass to newSVpv as the size of the string but instead you pass 0 as the length essentially trying to store a string in memory that is too small to hold it. You also seem to be freeing each of the items in results for some reason - I would likely wait to free it in the free(results) that you have after the loop. I may be way off but there appears to be a number of issues there – Timothy Legge Mar 08 '22 at 23:03
  • Thanks for your comments. Just for the record ```newSVpv()```'s 2nd arg can be left zero in which case Perl will calculate the strling length. – bliako Mar 09 '22 at 16:15

1 Answers1

4

On program exit I get a core dump with messages about memory corruption, double free etc. (e.g. double free or corruption (fasttop)).

This was probably because you overwrote the loop variable results_sz inside the for causing undefined behavior.

How to return an undef value from XS sub denoting that something went wrong (not an empty array)?

You can return &PL_sv_undef to signal an undefined value, see perlxs for more information. For example like this:

SV *
decode(infilename_SV)
    SV *infilename_SV
  PREINIT:
    char *infilename;
    STRLEN infilename_len;
    char **results;
    size_t results_sz;
    char *aresult;
    size_t I;
  CODE:
    infilename = SvPVbyte(infilename_SV, infilename_len);
    results = myfunc(infilename, &results_sz);
    if( results == NULL ){
      RETVAL = &PL_sv_undef;
    }
    else {
       AV *av = newAV();
       for(I=0; I < results_sz; I++){
          aresult = newSVpv(results[I], 0);
          av_push(av, aresult);
          free(results[I]);
       }
       free(results);
       RETVAL = sv_2mortal(newRV_noinc((SV*)av));
    }
  OUTPUT:
     RETVAL

if anyone can confirm that I am handling correctly the cases where strings from Perl into the C function are utf8-encoded (e.g. the input filename)

To pass a Perl UTF-8 string to the C-function as an UTF-8 encoded character string you can use SvPVutf8() instead of SvPVbyte(), see perlguts for more information. Example:

infilename = SvPVutf8(infilename_SV, infilename_len);

or the results back from the C function (which may contain utf8 strings) are sent back to Perl

You can use newSVpvn_flags() instead of newSVpvn() to convert an UTF-8 encoded C-string to a Perl string. For example:

aresult = newSVpvn_flags(results[I], strlen(results[I]), SVf_UTF8);
Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174
  • 2
    `newSVpvn_flags(results[I], strlen(results[I]), SVf_UTF8)` can also be written as `newSVpvn_utf8(results[I], strlen(results[I]), 1)` – ikegami Mar 09 '22 at 00:33
  • 1
    The core dumps are gone now, thanks for pointing it out - silly me! I have solved returning undef by using ```XSRETURN_UNDEF;```. With ```RETVAL = (AV *)(&PL_sv_undef);``` it returned a scalar-ref with undef contents. – bliako Mar 09 '22 at 15:42
  • @ikegami thanks, I am now using what you suggested. So now all works OK. In my Perl code I have string constants in utf8. So I ```use utf8;```. But if some user of my code omits that then problems arise. So I added some extra code: 1) XS: for input: ```infilename = SvUTF8(infilename_SV) ? SvPVutf8(infilename_SV, infilename_len) : SvPVbyte(infilename_SV, infilename_len);``` and for output ```newSVpvn_utf8(results[I], strlen(results[I]), 1)```. And back in Perl I do: ```my @newres = map { utf8::is_utf8($_) ? Encode::encode_utf8($_) : $_ } @results;``` if ```use utf8``` was omitted by someone. – bliako Mar 09 '22 at 15:49
  • @bliako, No, don't do that. That's completely wrong. It will give the wrong result on occasion! This is literally what we call "The Unicode Bug". Don't introduce a bug in your code to attempt to fix a bug in the user's code. – ikegami Mar 29 '22 at 15:14