4

I'm building a Javascript chat bot for something, and I ran into an issue:
I use string.split() to tokenize my input like this:
tokens = message.split(" ");

Now my problem is that I need 4 tokens to make the command, and 1 token to have a message. when I do this: !finbot msg testuser 12345 Hello sir, this is a test message

these are the tokens I get: ["!finbot", "msg", "testuser", "12345", "Hello", "sir,", "this", "is", "a", "test", "message"]

However, how can I make it that it will be like this: ["!finbot", "msg", "testuser", "12345", "Hello sir, this is a test message"]

The reason I want it like this is because the first token (token[0]) is the call, the second (token[1]) is the command, the third (token[2]) is the user, the fourth (token[3]) is the password (as it's a password protected message thing... just for fun) and the fifth (token[4]) is the actual message.
Right now, it would just send Hello because I only use the 5th token.
the reason why I can't just go like message = token[4] + token[5]; etc. is because messages are not always exactly 3 words, or not exactly 4 words etc.

I hope I gave enough information for you to help me. If you guys know the answer (or know a better way to do this) please tell me so.

Thanks!

Finlay Roelofs
  • 533
  • 6
  • 21

4 Answers4

3

Use the limit parameter of String.split:

tokens = message.split(" ", 4);

From there, you just need to get the message from the string. Reusing this answer for its nthIndex() function, you can get the index of the 4th occurrence of the space character, and take whatever comes after it.

var message = message.substring(nthIndex(message, ' ', 4))

Or if you need it in your tokens array:

tokens[4] = message.substring(nthIndex(message, ' ', 4))
Community
  • 1
  • 1
nickb
  • 59,313
  • 13
  • 108
  • 143
2

I would probably start by taking the string like you did, and tokenizing it:

const myInput = string.split(" "):

If you're using JS ES6, you should be able to do something like:

const [call, command, userName, password, ...messageTokens] = myInput;
const message = messageTokens.join(" ");

However, if you don't have access to the spread operator, you can do the same like this (it's just much more verbose):

const call = myInput.shift();
const command = myInput.shift();
const userName = myInput.shift();
const password = myInput.shift();
const message = myInput.join(" ");

If you need them as an array again, now you can just join those parts:

const output = [call, command, userName, password, message];
Alex LaFroscia
  • 961
  • 1
  • 8
  • 24
2

If you can use es6 you can do:

let  [c1, c2, c3, c4, ...rest] = input.split (" ");
let msg = rest.join (" ");
Kevin
  • 24,871
  • 19
  • 102
  • 158
1

You could revert to regexp given that you defined your format as "4 tokens of not-space separated with spaces followed by message":

function tokenize(msg) {
    return (/^(\S+) (\S+) (\S+) (\S+) (.*)$/.exec(msg) || []).slice(1, 6);
}

This has the perhaps unwanted behaviour of returning an empty array if your msg does not actually match the spec. Remove the ... || [] and handle accordingly, if that's not acceptable. The amount of tokens is also fixed to 4 + the required message. For a more generic approach you could:

function tokenizer(msg, nTokens) {
    var token = /(\S+)\s*/g, tokens = [], match;

    while (nTokens && (match = token.exec(msg))) {
        tokens.push(match[1]);
        nTokens -= 1; // or nTokens--, whichever is your style
    }

    if (nTokens) {
        // exec() returned null, could not match enough tokens
        throw new Error('EOL when reading tokens');
    }

    tokens.push(msg.slice(token.lastIndex));
    return tokens;
}

This uses the global feature of regexp objects in Javascript to test against the same string repeatedly and uses the lastIndex property to slice after the last matched token for the rest.

Given

var msg = '!finbot msg testuser 12345 Hello sir, this is a test message';

then

> tokenizer(msg, 4)
[ '!finbot',
  'msg',
  'testuser',
  '12345',
  'Hello sir, this is a test message' ]
> tokenizer(msg, 3)
[ '!finbot',
  'msg',
  'testuser',
  '12345 Hello sir, this is a test message' ]
> tokenizer(msg, 2)
[ '!finbot',
  'msg',
  'testuser 12345 Hello sir, this is a test message' ]

Note that an empty string will always be appended to returned array, even if the given message string contains only tokens:

> tokenizer('asdf', 1)
[ 'asdf', '' ]  // An empty "message" at the end
Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
  • Got one more question, what if I want to use lower amount of tokens? like let's say 3 tokens total. so 1 command would be 5 tokens, and another command would be 3. – Finlay Roelofs Aug 27 '16 at 19:58
  • Updated the answer with a more generic solution. – Ilja Everilä Aug 27 '16 at 20:23
  • Hii there, sorry for bothering you again, but I've ben rewriting my bot to Node.JS and updated it a bit, however, it killed the tokenizer... Right now, the array I get after running it through the tokenizer is empty. the only thing that has changed is the `!finbot` parameter (it get's stripped away earlier in the code). so the array is now just `["msg", "testuser", "12345", "Hello", "sir,", "this", "is", "a", "test", "message"]` I've tried everything that came to mind, but I couldn't fix it by myself. – Finlay Roelofs Sep 07 '16 at 15:48
  • It sounds like you're trying to pass an array as argument to the function. If true, then it is somewhat obvious that it'll not function. It was meant for splitting strings. On the other hand it should not return an empty array, but fail. – Ilja Everilä Sep 07 '16 at 18:56
  • ah yes, I see :) I now passed the whole string to it and it works! thanks! now I can facedesk for other reasons :D – Finlay Roelofs Sep 07 '16 at 18:58