3

So I've not a professional developer, but I dable in programming periodically. I'm looking to write a code, and looking for some advice on managing a parser which is reading a text file, looking at each line as a string, and trying to determine the input on that line. Any given line could be one of over 1000 different keywords, which is the difficult part. Once I have this keyword, I feel like there has to be a significantly more efficiently method of determining what it is rather than implementing 1000 if-else statements or 1000 case-break statements. Once I matched the given keyword, I plan to jump to a general routine that instantiates an object of that keyword type. I don't want to have to perform 999 tests before I find my target, it's just a waste i feel like. I tried breaking it down by alphabetical order, which reduces it greatly, but there are still an unmanageably large amount of if-else statements.

I already found out that I can't nest more than 128 if-else statements, so my current alternative is 1000s of just "if" statements without matching "else" statements, which I know is a bad practice. So here is a generalization of my current code:

if (keyword_str.compare(keyword1)) {
        Parse(keyword1); // A general routine to parse all these similarly formatted keywords
        } 
if (keyword_str.compare(keyword2)) {
        Parse(keyword2);
        } 
if (keyword_str.compare(keyword3)) {
        Parse(keyword3);
        }
//
//
//

if (keyword_str.compare(keyword999)) {
        Parse(keyword999);
        }
if (keyword_str.compare(keyword1000)) {
        Parse(keyword1000);
        }

Any help will be greatly appreciated! Thanks!


Okay, so here is the point I'm at, but still kind of lost on how to use a map to determine an object type, and then instantiate that object. Here are some code snippets:

class baseClass
    {
    public:
        baseClass();
        ~baseClass();
    };
//
// keyword1 class declaration
class keyword1 : public baseClass
    {
    public:
        // Constructors
        keyword1 () { cout << "keyword1 constructor..." << endl;}
        ~keyword1 () { cout << "keyword1 destructor..." << endl;}

    protected:

    };
//
// keyword2 class declaration
class keyword2 : public baseClass
    {
    public:
        // Constructors
        keyword2 () { cout << "keyword2 constructor..." << endl;}
        ~keyword2 () { cout << "keyword2 destructor..." << endl;}

    protected:

    };
//
// keyword3 class declaration
class keyword3 : public baseClass
    {
    public:
        // Constructors
        keyword3 () { cout << "keyword3 constructor..." << endl;}
        ~keyword3 () { cout << "keyword3 destructor..." << endl;}

    protected:

    };


//
//*******************


    map <string, baseClass> keyword_map;

    keyword_map.insert (make_pair ("keyword1", keyword1 )); // ########## This is where I'm lost
    keyword_map.insert (make_pair ("keyword2", keyword2 )); // ########## This is where I'm lost
    keyword_map.insert (make_pair ("keyword3", keyword3 )); // ########## This is where I'm lost

    // Search for keyword
    string searching_for = "keyword3";
    map <string, baseClass> ::const_iterator it = keyword_map.find(searching_for);


    if (it == keyword_map.end()) {
        cout << "No keyword found." << endl;
            }
        else 
            {
        cout << "Found the keyword!" << endl;
        it->second; // ########## This is where I'm lost
            }
  • 1
    You should learn tables and loops. – stark Aug 28 '14 at 03:28
  • @stark While tables and loops are good things to know, a `std::unordered_map` can dispatch this kind of thing properly. The bigger issue is *"are you really going to define 1,000 object types? ...why...?"* And I give the question a +1 for asking and wanting to know how to do things better, instead of just going ahead and doing something nutty...which is what many people will do. :-) – HostileFork says dont trust SE Aug 28 '14 at 03:58
  • thanks for the responses everyone. This was the first time I used this wen bite, and I'm impressed on how fast i revived feedback. thanks again for the help. // // // stark - I'll check those out. // // // HostileFork - I'll look into that container a bit more. I guess I could grab the mapped value, and pass that to the generic parsing routine. Although, I'd still have to iterate through the container, but this would be less cluttered than the 1000 if-else / case-break statements. Thanks! – Not_much_of_a _programmer Aug 28 '14 at 21:20
  • And yes, I do plan to make 1000 classes, probably more, because each keyword is dramatically different from the next and physically represents something else. – Not_much_of_a _programmer Aug 28 '14 at 21:23

3 Answers3

5

Once I matched the given keyword, I plan to jump to a general routine that instantiates an object of that keyword type.

Your intuition that you don't want to have 1,000 different IF statements is a correct one.

Abstractly, I'd suggest considering how an old-school card catalog works (assuming you've ever seen one, do younger people still know what these are?)

enter image description here

Card catalogs are useful because you don't start from the first drawer and look through all the items in sequence, then move on to the next drawer. Instead, you have a quick test you can use to know which drawer to look in. That quick test involves fingerprint, or a "hash" of the candidate. Old library card catalogs often use a very simple "hash function" (the first one or two letters; "this drawer contains all the cards for books whose titles start with "S-Ti"). You narrow down the number of comparisons you need to do based on that test to only look in one drawer.

If it sounds like a lot of work to come up with a way to fingerprint strings, and file them into buckets like this, you are in luck. This is all work already done for you under the hood of the standard library. Unless your needs are very specialized (or your keywords have odd patterns in them where they'd all have the same "fingerprint")...the std::unordered_map should work.

Pick a "key" that is a std::string representing your keywords. The "value" will be a factory of some kind...a way to create your objects from the stuff that comes after the keyword. That's could be the basis of your "better way"...

..BUT

In terms of initializing a std::unordered_map to do your bidding in this case, if your "values" in the map are each geared to constructing a different class, 1000 is a large number of classes. You might want to lay out more specifics before typing class ObjectOne and numbering them up to class ObjectOneThousand, which sounds every bit as questionable as performing 1000 IF statements for comparison.

So perhaps you should seek more review in a chat or some other forum before going ahead with the idea.


UPDATE to respond to edit

Your code has a problem in terms of what the keyword classes are. Are they intended to represent the keyword class (as in...there are only as many of them ever instantiated as you have keywords?) One should be skeptical of a class that only ever has one instance and represents a class of things; that's what the class itself is for. If that makes sense. :-/

So what you want to put in your map are not instances of keyword. It's more that you conceptually want to put a keyword class that you invoke later. In spirit, that would be like:

#include <typeinfo>

map <string, type_info &> keyword_map;

keyword_map.insert (make_pair ("keyword1", typeid(keyword1) )); 
keyword_map.insert (make_pair ("keyword2", typeid(keyword2) )); 
keyword_map.insert (make_pair ("keyword3", typeid(keyword3) ));

You might think that later you could call some kind of make_class with the type_info, but it doesn't work that way. Hence...the idea of storing a factory function to get that behavior. I'll give you a simple answer with a static member function, so in each keyword class you'd have something like this:

class keyword1 : public baseClass {
    // ...
    static shared_ptr<baseClass> factory() {
        return make_shared<keyword3>();
    }
    // ...
};

Because it's a static member it's just like an ordinary function. You can take the address of it, store the pointer, and then later call it without any class instance to call it on. It returns a shared pointer to the base class, and although all you will wind up with is a pointer to the base class...it will behave polymorphically on any virtual functions you have defined in the base class's interface, as appropriate for the kind of keyword it is.

(NOTE that you need to make your destructors virtual in cases like this! It's best to do it by default and only not do it if you have a really good reason.)

map <string, shared_ptr<baseClass>(*)()>> keyword_map;

keyword_map.insert (make_pair ("keyword1", &keyword1::factory )); 
keyword_map.insert (make_pair ("keyword2", &keyword2::factory )); 
keyword_map.insert (make_pair ("keyword3", &keyword3::factory ));

Now later when you find a keyword, you call the function you get back from find to get an instance of the corresponding keyword class. And then do whatever it is you were planning to do with the object instances.

But I imagine you will find it hard to define an interface on the base class which satisfies you in such a design. Which is again why I said the 1,000s of classes indicates you may not have a problem you want to approach the way you think you do. I also imagine you will have many other questions, but please make them new question posts of their own. :-)

Community
  • 1
  • 1
  • 1000 may be large as a number of keywords, but it isn't a large number for a map. It is tiny. – user207421 Aug 28 '14 at 07:08
  • @EJP It would be a large number of *class* types, which may be implied by the *values* in the map...in the OP's proposal. I am challenging the class-per-keyword idea when you are dealing with 1,000 keywords, as well as the idea that 1,000 keywords indicates a properly thought out design. More information is needed to assess what a good answer is, so I just approached the philosophical issue of what maps are about. – HostileFork says dont trust SE Aug 28 '14 at 07:12
  • @HostileFork Straw man. I haven't said anything about class-per-keyword. – user207421 Aug 28 '14 at 10:06
  • Thanks again guys, but 1000s of classes if defiantly what I need. Again, each keyword is dramatically different from the next and physically represents something else. This is going to be a big program, that crunches a lot of data, which is why I want to figure out how to sort these keywords as efficiently as possible. – Not_much_of_a _programmer Aug 28 '14 at 21:33
  • @EJP as the OP has now said they actually do intend 1000 classes, I think my advice about stepping back to consider that was reasonable. But for your satisfaction I have edited to make it more clear I wasn't saying that 1000 is an unreasonable number of entries for a std::unordered_map. – HostileFork says dont trust SE Aug 29 '14 at 08:27
  • So now that I'm at the point where I created this map, I did notice there is no way (that I know of) to then instantiate the object type I've found in the map. Is there a way to have each key value paired with a different object type? It looks like I can only use a single object. I tried making these 1000 classes a derived class on a general 'keyword' class, but that didn't work for me. I can determine the keyword with this map, but now I don't know how to instantiate an the object? Any ideas? – Not_much_of_a _programmer Sep 02 '14 at 03:13
  • @Ben Well, one idea is to delete the answer that isn't an answer that you posted, which will improve the institutional knowledge of cyberspace and earn you StackOverflow cred! Regarding how to make a base class called "Keyword" and then create differing derived types, you will need a function to serve as an object factory...and the return type will have to return those objects by pointer (hopefully a [smart pointer](http://en.wikipedia.org/wiki/Smart_pointer)) instead of by value. If unclear, see the [Slicing Problem](http://stackoverflow.com/questions/274626/what-is-the-slicing-problem-in-c). – HostileFork says dont trust SE Sep 02 '14 at 04:28
  • @HostileFork - I deleted the answer per your instruction. Again, my apologies, this is my first time using this website. – Not_much_of_a _programmer Sep 02 '14 at 23:11
  • @Ben Yes...I know...that's why I tell you so you can do it yourself vs. flagging you. :-) Perhaps less efficient, but I think more friendly. – HostileFork says dont trust SE Sep 02 '14 at 23:59
  • @HostileFork - I'm only working on this project a little bit at a time, so my responses are slow. Is there a way I can post a sample of my existing code on here? It looks like I'm limited to 550 char or so. Yes, you were correct, I am trying to make a database. I've clearly never done it before and trying to learn by brute force. I understand how base and derived classes work, I've set those but, but I still don't understand how to design the "object factory" function. Even more so, return 1 of these 1000s of objects. I feel I would need another map 1000 bins long in that function, right? – Not_much_of_a _programmer Sep 04 '14 at 00:51
  • @Ben The Q&A format really is for distinct focused issues, instead of "epic" ones that grow and grow. Edits are for clarifying and narrowing scope, vs. expanding it! So it's usually better to ask new ones ("how to make a factory function", etc.) I've updated in any case. – HostileFork says dont trust SE Sep 04 '14 at 07:01
2

An unordered_map would work very quickly here. It's implemented as a hash map so it's lookup performance is roughly O(1).

std::unordered_map

In C++11 you can use std::string as your key and std::function<> as your value. I wrote an example here that shows how to use lambda functions in an unordered_map:

#include <iostream>
#include <unordered_map>
#include <functional>

using namespace std;

typedef unordered_map<string, function<void ()>> ufmap;
ufmap M;

int main() {
    // Create Keys and functions
    string s_a = "A";
    function<void ()> f_a = [](){
        cout << "Construct type A here" << endl;
    };

    string s_b = "B";
    function<void ()> f_b = [](){
        cout << "Construct type B here" << endl;
    };

    // Add Keys and functions to the map
    M.insert(pair<string, function<void ()>>(s_a, f_a));
    M.insert(pair<string, function<void ()>>(s_b, f_b));

    // Finding a string and using its function
    string searching_for = "A";

    ufmap::const_iterator it = M.find(searching_for);

    if (it == M.end()) {
        // String key wasn't found
    }
    else {
        it->second();
    }
}
Foggzie
  • 9,691
  • 1
  • 31
  • 48
  • @Ben No problem. If my post or HostileFork's post helped you out, you can select one as an answer. – Foggzie Aug 29 '14 at 21:06
  • I'm brand new with this site, so bare with me. Nevertheless, I actually do have a follow up question for you. Since I do plan to use this map to determine the object type via a string I'll be comparing to the key value. I determined which one of the 1000 keywords are, but now how would I instantiate that object? Once I've determined the the object type, then I'm lost as what to do next. Since the mapped value can only be a single object type, I could like in your example to make a function per type, but that defeats the purpose of using the map. Any suggestions? – Not_much_of_a _programmer Sep 04 '14 at 00:44
1

Two solutions:

  1. Use a lookup table. This is better if your list of keywords is at all dynamic and it can be O(1) if you use a hash table.

  2. Use a lexical analyser such as flex(1) and build the keywords into the .l file. This can be even faster, as it proceeds a character at a time with no final lookup step, but it is only suitable if the list of keywords is completely fixed in advance, for example in a programming language.

user207421
  • 305,947
  • 44
  • 307
  • 483