4

Hello community I hope I can meet some byte string experts here. I guess SvPVbyte comes into play, but how?

My problem. I already sucessfully parse Perl array XYZ (within a hash of arrays) with example index 6789) within Inline:C with Perl:

$testn=pnp($lengthofXYZ,\@{$XYZ{$_}});

Inline C:

int pnp ( int n, SV *vertx)
AV *arrayx;
double val_of_interest;
arrayx = (AV *)SvRV( vertx );
SV **yi;
yi = av_fetch( arrayx, 6789, 0 );
val_of_interest = SvNV( *yi );
return calculation_with_val_of_interest

This works perfectly. But lets say I have a very long byte string (about 10-50MB) in Perl $xyz="\x09\x07\x44\xaa......

Now I want to pass a reference to this SV and walk in 9 byte steps (substr like) in C part throu this string without copying it completely in an own C array for example.

The walking part: first 4 bytes shall be checked against a reference 4 Byte value ABC that also shall be in the function call. If necessary I can unpack "N" this search phrase before and call function with integer. If postition 0 not successfull jump/increment 9 bytes furter, if sucessfull I will deliver the found position as return.

Thank you so much.

ikegami
  • 367,544
  • 15
  • 269
  • 518
GoetzM
  • 41
  • 2

1 Answers1

2
#include <stdint.h>
#include <string.h>

void foo(SV* sv) {
    STRLEN len;
    const char *buf = SvPVbyte(sv, len);

    if (len < 4) {
        /* ... Error ... */
    }

    uint32_t sig =
        ((unsigned char)(buf[0]) << 24) |
        ((unsigned char)(buf[1]) << 16) |
        ((unsigned char)(buf[2]) <<  8) |
        ((unsigned char)(buf[3]) <<  0);

    buf += 4;
    len -= 4;
    if (sig != ...) {
        /* ... Error ... */
    }

    while (len >= 9) {
        char block[9];
        memcpy(block, buf, 9);
        buf += 9;
        len -= 9;

        /* ... Use block ... */
    }

    if (len > 0) {
        /* ... Error ... */
    }
}

[This is an answer to the question in the comments]

  • NEVER use use bytes;. "Use of this module for anything other than debugging purposes is strongly discouraged." (And it's not actually useful for debugging purposes. Devel::Peek is more useful.)
  • Absolutely no reason to use our here.
  • An int could be too small for the return value.
  • It's not working because you're searching the stringification of a reference.
  • In fact, there's no need to create a reference.

use strict;
use warnings qw( all );

use Inline C => <<'__EOS__';

SV* find_first_pos_of_43h_in_byte_string(SV* sv) {
    STRLEN len;
    const char *p_start = SvPVbyte(sv, len);
    const char *p = p_start;
    const char *p_end = p_start + len;
    for (; p < p_end; ++p) {
        if (*p == 0x43)
            return newSVuv(p - p_start);
    }

    return newSViv(-1);
}

__EOS__

my $buf = "\x00\x00\x43\x01\x01\x01";
my $pos = find_first_pos_of_43h_in_byte_string($buf);

Of course, you could simply use

use strict;
use warnings qw( all );

my $buf = "\x00\x00\x43\x01\x01\x01";
my $pos = index($buf, chr(67));
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Dear Ikegami, right now I am reading these lines for the frirst time. On saturday my son (8y) had a very serious accident at the playground and we were in hospital. I see it works (yeah!!!9) will now adapt it for 9 byte steps(you showed already the "unpack") with check for the passed 4 byte search value position. Thank you so much for your patience. A warm thanks to you. Goetz – GoetzM Aug 08 '18 at 04:34
  • Ikegami I will have to pass a reference within a string (within hash of strings) and the 4 byte value to search. But I will be able to do this alone....hopefully. – GoetzM Aug 08 '18 at 04:44
  • No idea what you mean by a "reference within a string", but there's absolutely no reason to pass a reference. A hash element (e.g. `$hash{foo}`) is just a scalar, so just pass that – ikegami Aug 08 '18 at 04:45
  • And again, just use `index`! – ikegami Aug 08 '18 at 04:47
  • I used "use bytes", because I thought the length+substr commands might fail sometimes because of a falsely utf8 interpretation in my binary strings. I cant use index, because I have strictly 9 byte steps in search for 4 byte value AND will adapt you solution to a binary search because of large strings. (this is running with array instead of substr of a string perfectly (also in C) but needs to much memory) – GoetzM Aug 08 '18 at 04:51
  • `use bytes;` is garbage, which is why its own documentation says not to use it. If you're lucky, it won't do anything at all. If you aren't, it will cause `length` to return something other than the length of your string, and it will cause `substring` to return strings that aren't in your string. – ikegami Aug 08 '18 at 04:52
  • A binary search would require the values to be sorted – ikegami Aug 08 '18 at 05:29
  • I know, right now I use a binary search over a sorted arrays (in an hash off arrays). Runs perfectly under Perl and under C /(with reference to one of the arrays) but memory usage forces me to stay at my binary string (which is sorted already) :) Now I have time to adapt to pass reference, 4 byte check value and search with 9 byte jumps. I am happy you helped so much! – GoetzM Aug 08 '18 at 09:04
  • Yeah, referencing and passed decimal integer search phrase works (saving the unpack, also for comparision is next step- 4 chars against 4 chars) , Now I have to use your unpack solution for 4 byte unpack or shift to compare four chars as binary search item (you showed with block already) – GoetzM Aug 08 '18 at 09:18
  • Hello Ikegami, everything is working but I have problems with the unpack funktion. Your suggestion looks good but lets pack in this way: $val=pack "N",3000000000. I checked that p[3] contains the lowest char and has not to be shifted, but with sig = (p[0] << 24) | (p[1] << 16) | (p[2] << 8) | p[3]; I always get -3121664. When analysing the byte converted chars, I see they are always signed, sig is also, although I define it as unsigned long or uint32_t. Maybe the cast from char forces signed bytes and that creates an error...Do you have an idea? – GoetzM Aug 09 '18 at 08:45
  • 1) Stop asking new questions in the comments. You aren't even asking Perl questions anymore! 2) You are mistaken. A `uint32_t` can't possible have `-3121664` for value. That said, there could be a bug that's fixed by casting the `char` values to `unsigned char` values – ikegami Aug 09 '18 at 08:50
  • Usigned Char, that was it. Thanks.Did not know that only Perl should discussed. Evrything perfect now! Thread can be closed. – GoetzM Aug 09 '18 at 10:21
  • Re "*Did not know that only Perl should discussed*", No new questions in comments. As I mentioned already, if you have a new question to ask, [ask a new Question](https://stackoverflow.com/questions/ask) – ikegami Aug 09 '18 at 10:23