Sorted Array vs Hashtable: Which data structure would be more efficient in searching over a range of dates in a calendar app?

Question

I have about a year of experience in coding in Java. To hone my skills I'm trying to write a Calendar/journal entry desktop app in Java. I've realized that I still have no experience in data persistence and still don't really understand what the data persistence options would be for this program -- So perhaps I'm jumping the gun, and the design choices that I'm hoping to implement aren't even applicable once I get into the nitty gritty.

I mainly want to write a calendar app that allows you to log daily journal entries with associated activity logs for time spent on daily tasks. In terms of adding, editing and viewing the journal entries, using a hash table with the dates of the entries as keys and the entries themselves as the values seems most Big-Oh efficient (O(1) average case for each using a hash table).

However, I'm also hoping to implement a feature that could, given a certain range of dates, provide a simple analysis of average amount of time spent on certain tasks per day. If this is one of the main features I'm interested in, am I wrong in thinking that perhaps a sorted array would be more Big-Oh efficient? Especially considering that the data entries are generally expected to already be added date by date.

Or perhaps there's another option I'm unaware of?

The reason I'm asking is because of the answer provided by this following question: Why not use hashing/hash tables for everything?

And the reason I'm unsure if I'm even asking the right question is because of the answer to the following question: Whats the best data structure for a calendar / day planner?

If so, I would really appreciate being directed other resources on data persistence in java.

Thank you for the help!

For a desktop app with just a few entries, I doubt there would be any significant difference: you won't have millions of entries in your calendar. But one option you haven't considered is a TreeMap, which has O(log N) lookup/insertion time, and allows getting a submap containing a range of entries efficiently, too. I don't see what this has to do with persistence, though. None of these collections are persistent. You'll need to read and write their elements somehow to a persistent storage, but that doesn't have much to do with the collection you choose to use. — JB Nizet, Aug 24 '19 at 18:44
Seems like in this particular case they would both be O(N) because to get the keys for the `HashTable`, you'd still need to iterate through the whole table. A sorted array is probably just fine for what you are going for (especially since you have no guarantee of the dates coming out in sorted order on the HashTable) — Chris Gilardi, Aug 24 '19 at 18:44

erickson · Answer 1 · 2019-08-24T18:49:00.833

6

Use a NavigableMap interface (implemented by TreeMap, a red-black tree).

This allows you to easily and efficiently select date ranges and traverse over events in key order.

As an aside, if you consider time or date intervals to be "half-open" it will make many problems easier. That is, when selecting events, include the lower bound in results, but exclude the upper. The methods of NavigableMap, like subMap(), are designed to work this way, and it's a good practice when you are working with intervals of any quantity, as it's easy to define a sequence of intervals without overlap or gaps.

edited Aug 24 '19 at 18:49

answered Aug 24 '19 at 18:43

erickson

265,237
58
395
493

2

A thousand times yes to your "as an aside". – Andy Turner Aug 24 '19 at 18:49
1

A few question you need to address, though, are: what is the key? Is it the start of the event, or the end? What if you have two events with the same start/end? What if you are keying by start and want to search for end? (This is not to say that TreeMap is a bad choice; it's just not a panacea). – Andy Turner Aug 24 '19 at 18:58
1

@AndyTurner That's true. I am assuming the "events" are modeled as an instant in time (the key) and a duration, along with other information. How to handle overlapping events, etc., would require a lot more knowledge of the requirements in this specific case. – erickson Aug 24 '19 at 19:02

score 2 · Answer 2 · answered Aug 24 '19 at 18:42

Depends on how serious you want your project to be. In all cases, be careful of premature optimization. This is when you try too hard to make your code "efficient", and sacrifice readability/maintainability in the process. For example, there is likely a way of doing manual memory management with native code to make a more efficient implementation of a data structure for your calendar, but it likely does not outweigh the beneits of using familiar APIs etc. It might do, but you only know when you run your code.

Write readable code
Run it, test for performance issues
Use a profiler (e.g. JProfiler) to identify the code that is responsible for poor performance
Optimise that code
Repeat

For code that will "work", but will not be very scalable, a simple List will usually do fine. You can use JSONs to store your objects, and a library such as Jackson Databind to map between List and JSON. You could then simply save it to a file for persistence.

For an application that you want to be more robust and protected against data corruption, a database is probably better. With this, you can guarantee that, for example, data is not partially written, concurrent access to the same data will not result in corruption, and a whole host of other benefits. However, you will need to have a database server running alongside your application. You can use JDBC and suitable drivers for your database vendor (e.g. Mysql) to connect to, read from and write to the database.

For a serious application, you will probably want to create an API for your persistence. A framework like Spring is very helpful for this, as it allows you to declare REST endpoints using annotations, and introduces useful programming concepts, such as containers, IoC/Dependency Injection, Testing (unit tests and integration tests), JPA/ORM systems and more.

Like I say, this is all context dependent, but above all else, avoid premature optimization.

The right data structure will be faster *and* provide a better interface. And you don't need to profile your code to understand the complexity of different algorithms or data structures. I feel like while this answer begins and ends warning against premature optimization, the bulk of it looks too far down the road to alternative storage and even architectures, and that's ironic. — erickson, Aug 24 '19 at 18:59
The intent behind the answer was to give some idea about possible persistence options (as opposed to just which specific data structure), while stressing that, until the application is at least somewhat working, there are likely bigger concerns than what specific implementation to use. There is a fine line between "planning ahead" and "premature optimisation", sometimes I struggle to tell them apart — cameron1024, Aug 24 '19 at 19:22

score 0 · Answer 3 · answered Aug 24 '19 at 18:45

0

This thread might give you some ideas what data structure to use for Range Queries.

Data structure for range query

And it even might be easier to use a database and using an API to query for the desired range.

answered Aug 24 '19 at 18:45

crzdg

1
3

Andy Turner · Answer 4 · 2019-08-24T19:09:20.900

If you are using (or are able to use) Guava, you might consider using RangeMap (*).

This would allow you to use, say, a RangeMap<Instant, Event>, which you could then query to say "what event is occurring at time T".

One drawback is that you wouldn't be able to model concurrent events (e.g. when you are double-booked in two meetings).

(*) I work for Google, Guava is Google's open-sourced Java library. This is the library I would use, but others with similar range map offerings are available.

Sorted Array vs Hashtable: Which data structure would be more efficient in searching over a range of dates in a calendar app?

4 Answers4