2

I have string content that gets delivered to me via TCP. This info is only relevant because it means that I do not consistently retrieve the same string. I have a <start> and <stop> separator to ensure that any time I get the data via TCP, I am outputting the full content.

My incoming content looks like so:

<start>Apple Bandana Cadillac<stop>

I want to get everything in between <start> and <stop>. So just Apple Bandana Cadillac.

My script to do this looks like so:

servercsv.on("connection", function(socket){
    let d_basic = "";
    socket.on('data', function(data){
        d_basic += data.toString();
        let d_csvindex = d_basic.indexOf('<stop>');
            while (d_csvindex > -1){
                try {
                    let strang = d_basic.substring(0, d_csvindex);
                    let dyson = strang.replace(/<start>/g, '');
                    let dson = papaparse.parse(dyson);
                    myfunction(dson);
                }
                catch(e){ console.log(e); }
                d_basic = d_basic.substring(d_csvindex+1);
                d_csvindex = d_basic.indexOf('<stop>');
            }
    });
});

What this means is that I am getting everything before the <stop> string and outputting it. I have also included the line let dyson = strang.replace(/<start>/g, ''); because I want to remove the <start> text.

However, because this is TCP, I am not guranteed to get all parts of this string. As a result, I frequently get back stop>Apple Bandana Cadillac<stop> or some variation of this (such as start>Apple Bandana Cadillac<stop>. It is not consistent enough that I can just do strang.replace("start>", "")

Ideally, I would like my separator to select content that is in between <start> and <stop>. Not just <stop>. However, I am unsure how to do so.

Alternatively, I can also settle for a regex that retrieves all combination of <start><stop> strings during my while loop, and just delete them. So check for <, s, t, a, r, t individually and so forth. But unsure how to implement regex to delete portions of a whole string.

Mohammad Javad Noori
  • 1,187
  • 12
  • 23
Jason Chen
  • 2,487
  • 5
  • 25
  • 44
  • uhhh BECAUSE this is TCP, all piece of data sent to the client are in the correct order and nothing should be missed. TCP protocol has a mechanism to ask for missing data (contrary to UDP). – Pierre Feb 08 '18 at 08:51
  • option1, use string.split(''). You will get an array of string which already split by "", then you replace the start tag; option 2, use regular expression to get what you want. – jilykate Feb 08 '18 at 08:51

2 Answers2

5

Assuming you get full response:

var test = "<start>Apple Bandana Cadillac<stop>";
var testRE = test.match("<start>(.*)<stop>"); 
testRE[1] //"Apple Bandana Cadillac"

If there are new lines between <start> and <stop>

var test = "<start>Apple Bandana Cadillac<stop>";
var testRE = test.match("<start>([\\S\\s]*)<stop>"); 
testRE[1] //"Apple Bandana Cadillac"

Using regular expressions capturing group here.

absin
  • 1,116
  • 10
  • 21
  • 1
    just a note in case: this will not work if there are new lines between `` and ``. To capture multiline content he can use `[\s\S]*` instead of `.*` – Kaddath Feb 08 '18 at 08:53
3

Try this regex with replace() method:

/<st.*?>(.*?)(?!<st)/g

Literal.................................................: <st

Any char zero or more times lazily...: .*?

Literal..................................................: >

Begin capture group..........................: (

Any char zero or more times lazily...: .*?

End capture group.............................: )

Begin negative lookahead.................: (?!

Literal...................................................: <st

End negative lookahead....................: )

In the Demo below notice that the test example consists of multiple lines, and variances of <start> and <stop> (basically <st).


Demo 1

var rgx = /<st.*?>(.*?)(?!<st)/g;

var str = `<start>Apple Bandana Cadillac<stop>
<stop>Grapes Trampoline Ham<stop>
<start>Kebab Matador Pencil<start>`;

var res = str.replace(rgx, `$1`);

console.log(res);

Update

"say I have op>Grapes Trampoline Ham<stop>...still trying to remove all parts of the string <stop>"

/^(.*?>)(.*?)(<.*?)$/gm;

A simple explanation will have to do since a step-by-step such as Demo 1 would take too much time.

  • This RegEx is multiline. /m
  • ^..........Begin line.
  • (.*?>)..Lazily capture everything until literal >........[Return as $1]
  • (.*?)...Then lazily capture everything until................[Return as $2]
  • (<.*?)..Literal < and lazily capture everything until..[Return as $3]
  • $...........End line.

The trick is to replace the second capture $2 and leave $1 and $3 alone.

Demo 2

var rgx = /^(.*?>)(.*?)(<.*?)$/gm;

var str = `<start>Apple Bandana Cadillac<stop>
<stop>Grapes Trampoline Ham<stop>
<start>Kebab Matador Pencil<start>
op>Score False Razor<stop>
`;

var res = str.replace(rgx, `$2`);

console.log(res);
zer00ne
  • 41,936
  • 6
  • 41
  • 68
  • the only issue i have with this is, say I have `op>Grapes Trampoline Ham` my output will be `op>Grapes Trampoline Ham`. still trying to remove all parts of the string `` – Jason Chen Feb 08 '18 at 17:45
  • 1
    See update, this will work as long as your input is multiline. – zer00ne Feb 09 '18 at 20:39