4

There are a multitude of key-value stores available. Currently you need to choose one and stick with it. I believe an independent open API, not made by a key-value store vendor would make switching between stores much easier.

Therefore I'm building a datastore abstraction layer (like ODBC but focused on simpler key value stores) so that someone build an app once, and change key-value stores if necessary. Is this API too simple?

get(Key)
set(Key, Value)
exists(Key)
delete(Key)

As all the APIs I have seen so far seem to add so much I was wondering how many additional methods were necessary?

I have received some replies saying that set(null) could be used to delete an item and if get returns null then this means that an item doesn't exist. This is bad for two reasons. Firstly, is it not good to mix return types and statuses, and secondly, not all languages have the concept of null. See:

Do all programming languages have a clear concept of NIL, null, or undefined?

I do want to be able to perform many types of operation on the data, but as I understand it everything can be built up on top of a key value store. Is this correct? And should I provide these value added functions too? e.g: like mapreduce, or indexes

Internally we already have a basic version of this in Erlang and Ruby and it has saved us alot of time, and also enabled us to test performance for specific use cases of different key value stores

Community
  • 1
  • 1
yazz.com
  • 57,320
  • 66
  • 234
  • 385
  • 8
    Simplicity is a good thing :) – Hans W Feb 23 '10 at 20:57
  • 2
    A straight forward API beats copious documentation of a complicated one every day. – rerun Feb 23 '10 at 20:59
  • 5
    Yes, it is too simple. No programmer will ever want to use it, as it will provide no job security whatsoever. – Adam Crossland Feb 23 '10 at 21:05
  • 2
    If you keep editing the API as we answer, the API will of course never be too simple. :-) – David Pfeffer Feb 23 '10 at 21:09
  • So far exists and delete "have" been added so as not to mix return values with statuses. But I also do not want to have the interface so simple as to be unuseable – yazz.com Feb 23 '10 at 21:12
  • So you want to build a KV store... what about existing ones? e.g. CouchDB, Cassandra etc. ? – jldupont Feb 23 '10 at 21:13
  • What is this for? What language? What type of application? Is it performance-sensitive, or should its primary goal be ease-of-use? Does it need to be parallelizable or threaded? What about atomic transactions? The more you can describe about the goal of your new library, the better this question can be answered. – Scott Stafford Feb 23 '10 at 21:19
  • I'm building a key-value store abstraction layer so that someone build an app once, and change key-value stores if necessary – yazz.com Feb 23 '10 at 21:22
  • @Zubair: could I suggest you update your question based on the system details you have just exposed in the comments here? – jldupont Feb 23 '10 at 21:25
  • 1
    @Zubair: My advicew is, don't make this abstraction layer. Just use an existing good one that you choose carefully. In my experience, the "abstraction layer" as you're applying it here most often provides frustration due to an incomplete API and the "I can switch easily" option it theoretically provides never actually happens and would break things anyway. On top of which, if you DID really want to switch, it wouldn't really be all that hard to just go through and change the calls. – Scott Stafford Feb 23 '10 at 21:35
  • @scott. In the 90s we had something called Open Data Base Connectivity. Think of this as ODBC for key value stores. Its probably not going to be much use for most developers but there was is a small niche who will really benefit from having a standardized well-known API which they can use from job to job. Also it makes building certain tools easier as they can run on many key-value stores. – yazz.com Feb 24 '10 at 07:22
  • @Zubair: And on how many projects did you actually use the flexibility ODBC offered and switch databases mid-project? And had you done so, how many "standard" SQL statements would have broken or been suboptimal? Now, I do certainly agree that the well-known API is of real value. SQL, the C++ STL, ... -- I think the biggest value they bring is that I can learn them once and even if the implementation behind them changes because I'm on a new project, I as a programmer can still use my knowledge. – Scott Stafford Feb 24 '10 at 17:02
  • @scott. We didn't switch databases for a single customer, but we did have to support multiple customers for the same product, each customer using their own choice of database with Oracle, SQLServer, and Sybase being the main ones. – yazz.com Feb 25 '10 at 10:52

9 Answers9

6

Do only what is absolute necessary, instead of asking if it is too simple, ask if it is too much, even if it only has one method.

Francisco Aquino
  • 9,097
  • 1
  • 31
  • 37
4

If all you are doing is getting, setting, and deleting keys, this is fine.

David Pfeffer
  • 38,869
  • 30
  • 127
  • 202
4

Your API lacks some useful functions like "hasKey" and "clear". You might want to look at, say, Python's hack at it, http://docs.python.org/tutorial/datastructures.html#dictionaries, and pick and choose additional functions.

Everyone is saying, "simple is good" and that's true until "simple is too simple."

Scott Stafford
  • 43,764
  • 28
  • 129
  • 177
  • 1
    I added "hasKey" as "exists". I'm not sure about "clear" though – yazz.com Feb 23 '10 at 21:13
  • Clear is less mandatory if there is good control over construction and destruction. – Scott Stafford Feb 23 '10 at 21:17
  • building an API for internal use is a good example of yagni. don't build it until you need it. doing so will keep the design as simple as it has to be and not "too simple". – Chris Conway Feb 23 '10 at 22:04
  • Yes, we need the API internally as we have several projects that need it. But it is still too few projects and langauges to make sure we get it right, which is why we are asking here – yazz.com Feb 24 '10 at 07:39
  • 1
    HasKey is of limited value, as, unless you have some kind of locking mechanism (which I'd argue against), the result of the call would be obsolete as soon as you got it. SetIfNonExisting or something similar may make more sense, similar to exclusive create on a file. – kyoryu Feb 24 '10 at 08:01
  • SetIfNonExisting is an interesting command I do need to think about. Can atomic operations like incr and decr be implemented on top of SetIfNonExisting? – yazz.com Feb 24 '10 at 09:33
  • 1
    @Zubair: No, incr and decr can't, you'd need a function that could take a function, like "DoIfKeyExistsElseSetTo(r => r + 1, 1)" -- that would be atomic, but impossible across languages and not very simple. ;) – Scott Stafford Feb 25 '10 at 15:56
  • hmm, interesting @scott, I'm going to think about that – yazz.com Feb 26 '10 at 19:39
3

There is no such thing as "too simple" for an API. The simpler the better! If it solves the need the way it is, then leave it.

Chris Conway
  • 16,269
  • 23
  • 96
  • 113
3

The delete method is unnecessary. You can just pass null to set.

Edited to add:

I'm only kidding! I would keep delete, and probably add Count, Contains, and maybe an enumerator (or two).

Jeffrey L Whitledge
  • 58,241
  • 9
  • 71
  • 99
  • 2
    True, but I don't think it is a good idea. It makes it less obvious what the code does for the reader. A person reading delete(a) will probably understand what's going on. A person reading set(a, null) however will probably have to look in the documentation to figure out what's going on. There may also be situations where storing null is something exactly what the user wants to do. – Laserallan Feb 23 '10 at 21:06
  • I added "exists" as I don't want to start mixing return values with statuses. Thanks – yazz.com Feb 23 '10 at 21:06
  • 1
    But depending on the system, this might mean an additional `round trip` to get the data... – jldupont Feb 23 '10 at 21:12
  • @Laserallan - Agreed. I have to confess that my answer was really a joke. I hate magic functions that do lots of different things depending on the parameters! – Jeffrey L Whitledge Feb 23 '10 at 21:12
  • I don't think that this answer qualifies as a joke :) Without exists()/has_key() semantics of delete are really hard to define. Consider following sequences 1) set(a, 42); set(a, null); b = get(a); 2) set(a, 42); delete(a); b = get(a). After first sequence b equals null (obviously). After second sequence value of b is not well defined, either it's null and then you are unable to differentiate between null values and no key or some kind of condition was raised and it needs to be included in the API as well. – Wojciech Bederski Feb 23 '10 at 21:18
  • @wuub - Yes, that's true. If given the option, though, I would prefer to be able to store a null explicitly, and throw an exception on a get of an unset key. And that seems to be the direction that the OP is taking it now. – Jeffrey L Whitledge Feb 23 '10 at 21:34
  • @wuub. Exactly the reason I used internally of not mixing statuses and values, thanks for explaining it better than I did! :) – yazz.com Feb 24 '10 at 07:26
2

I am all for simplifying an interface to its bare minimum but without having more details about the requirements of the system, it is tough to tell if this interface is sufficient. Sure looks concise enough though.

Don't forget to document the semantics for "key non-existent" as it isn't clear from reading your API definition above. updated: I see you have added the exists method: is this necessary? you could use the get method and define a NIL of some sort, no?

Maybe worth thinking about: how about considering "freshness" of a value? i.e. an associated "last-modified" timestamp? Of course, it depends on your system requirements.

What about access control? Is it within scope of the API definition?

What about iterating through the keys? If there is a possibility of a large set, you might want to include some pagination semantics.

jldupont
  • 93,734
  • 56
  • 203
  • 318
  • I have added "exists". Also interesting point about the Freshness. This will be used on top of keyvalue stores which are eventually consistent, but they offer different types of versioning. Some use Version numbers (Riak) and and some timestamps (Cassandra) – yazz.com Feb 23 '10 at 21:11
  • In regard to your question about NIL, this is a strange one as I am not sure if all languages have a concept of a NIL value – yazz.com Feb 24 '10 at 07:27
  • Access control is something that will go in a "connection" string. In Erlang for example it will be the first parameter of every function, and will include any special priviledges needed. – yazz.com Feb 24 '10 at 07:35
  • Iteration is still open to how I will do this. I have many ideas but I need to make sure they are usable for end users – yazz.com Feb 24 '10 at 07:36
2

When creating an API, you need to ask yourself, what does my API provide the user. If your API is so simplistic that it is faster and easier for your client to write their own app, then your API has failed. Ask yourself, does my functionality give them specific benefits. If the answer is no, it is too simplistic and generic.

Jeremy B.
  • 9,168
  • 3
  • 45
  • 57
  • Sorry, I didn't understand what you mean. Could you rephrase your answer please. Thanks – yazz.com Feb 23 '10 at 21:15
  • 1
    comes back to listing the "system requirements" based on the "use cases" IMO. – jldupont Feb 23 '10 at 21:26
  • We already have a basic version of this in Erlang and Ruby and it has saved us alot of time, and also enabled us to test performance for specific use cases of different key value stores – yazz.com Feb 24 '10 at 07:49
1

As mentioned, the simpler the better, but a simple iterator or key-listing method could be of use. I always end up needing to iterate through the set. A "size()" method too, if not taken care of by the iterator. It obviously depends on your usage, though.

T.Kliether
  • 11
  • 1
  • Size is a difficult one as often it is a hugely expensive operation. I am consudering an iterator, but I only want to add it if I can make it so simple in use as to be self describing – yazz.com Feb 23 '10 at 21:14
  • @Zubair - If a size method is deemed necessary, then, in an API like this, it can always be made a O(1) operation. Just keep a count variable, and increment or decrement as necessary. – Jeffrey L Whitledge Feb 23 '10 at 21:21
  • Yes, keeping a running count should work fine. For the iterator (if you need it), just a simple "getKeys()" would handle it, if a bit more memory intensive. – T.Kliether Feb 23 '10 at 21:58
  • Size can get very tricky though. Example: for one customer they use a database as their backend key value store. When the data is updated by clients external to our API then we lose count of those inserts/deletes. – yazz.com Feb 24 '10 at 07:41
0

It's not too simple, it's beautiful. If "exists(key)" is just a convenient shorthand for "get(Key) != null", you should consider removing it. I guess that depends on how large or complex the value you get() is.

eirikma
  • 657
  • 5
  • 13