6

I need to maintain lists of 40 recent added, most popular/ most liked items for each item category(total categories around 2000) in my application. I do store the views count & no Of likes for each item. For that purpose, I'm looking to probably maintain an in-memory structure on app server, to store & retrieve these list of items.

Do you have any ideas about how to implement this in-memory data structure, & importantly, keeping in mind the associated memory footprint & minimizing it to the farthest extent)?


Using:

Java 1.6

Rajat Gupta
  • 25,853
  • 63
  • 179
  • 294
  • 7
    there are a number of issues to solve here. I think you need a decay function, so that likes and visits from 6 months ago are discounted in comparison to activity in the last week. Also, your question about a data structure that occupies minimum space... Is that the question you really want to ask? Shouldn't you focus on getting your functionality working first before worrying about space? – ControlAltDel Apr 03 '12 at 19:10
  • http://stackoverflow.com/questions/81346/most-efficient-way-to-increment-a-map-value-in-java – sfk Apr 12 '12 at 21:49
  • 1
    In-memory ranking object? Screw dat. `select *, (views*.10 + likes) as rank from vw_categories_with_decayed_views_and_likes order by rank desc limit 40` – lsl Apr 12 '12 at 22:24

4 Answers4

8

Before you settle on an in-memory structure, consider what happens when your server has to reboot. That in-memory structure is going to go away. If that is alright, then you should go with an in-memory structure. If it is not, you may want to consider simply using an separate domain object to handle this sort of metadata.

In-Memory

Apache processor threads do not share memory, so the best way to do this is probably to install something like memcached. Every time you want to get the current items, you make a call to a particular key ("topforty"). Memcached stays persistent and any of your threads can call it concurrently. This is a highly scalable solution.

In order for it work, however, you have to do additional work. Some program will need to assess the current likes and views, and update the topforty key. This could be done through your admin web application, or as a cron job on an hourly or daily basis. The service defined below could do it as well, simply doing it's work with memcached rather than with the object it's persisting.

Domain Object

If persistence is more key, and you're willing to hand off concurrence to your web application framework, then you want to create a service that handles this:

public interface PopularityService {
  public List<Item> getTopItems(int count);//gets n top items

  //lets the service know someone liked a thing
  public void registerLike(Item item, Person liker);

  //lets the service know someone viewed a 
  public void registerView(Item item, Person viewer);thing
}

This is going to require some backing object:

public class PopularStuff {
  public List<Item> popularItems
  ...
}

You should persist that object as a single object (or if your framework makes it easy, as a singleton). Your service should act on that object, deciding what should be in it and how to move things around. This will be a read-heavy solution, but not as read-heavy as other static data because presumably people will be doing a lot of views. If you're using something like Hibernate it will be very easy to jump from the list of items to the actual items in your database.

Note that I don't discuss the underlying algorithm, as you didn't ask about that but about how to implement the data structure. If you can provide details on your current framework we could discuss more specifics.

Nathaniel Ford
  • 20,545
  • 20
  • 91
  • 102
  • Thanks for the answer Nathaniel. I am using JSF without any anything like Hibernate etc.. ! – Rajat Gupta Apr 10 '12 at 20:58
  • Regarding the server reboots, i am going to be periodically saving the data to DB & also while the app is shudown it would save the current state – Rajat Gupta Apr 10 '12 at 21:00
  • I have been planning to hold this data structure within an application scoped managed bean – Rajat Gupta Apr 10 '12 at 21:02
  • The main thing you have to worry about, then, is writes to the bean. As long as multiple threads can write to it without overwriting each other, you should be good! – Nathaniel Ford Apr 10 '12 at 22:58
4

Have you checked out Priority Queue? It seems like that takes care of your ordering needs, once you setup the right Comparator. If your list sizes are dynamic, then memory size might be an issue. But since you know how many items will be in each list, you can just specify that size as the initial capacity.

Bill Bushey
  • 143
  • 1
  • 6
3

I'm going to make a pretty big assumption, which is that when you say you "store the views count & no Of likes", you mean that they are stored in a query-friendly format (ie, an SQL Database or equivalent). The main reason you want to store information in-memory, then, is to cache the data, thereby reducing the number of database calls required to generate a page. Is this assumption correct?

If so, then I think you are over-complicating the issue. Rather than using complex data structures to maintain your information, treat it as a simple cache structure. Here's a highly simplified, pseudocode example of how it would work:

class TopXCache extends Runnable
{
  Object[] cachedData;

  int secondsToTimeOut;
  String sqlQueryToRefreshCache;

  boolean killSwitch = false;

  constructor(int itemsToKeepInCache, int secondsToTimeOut, String sqlQueryToRefreshCache)
  {
     this.secondsToTimeOut = secondsToTimeOut;
     this.sqlQueryToRefreshCache = sqlQueryToRefreshCache;

     this.cachedData = new Object[itemsToKeepInCache];
  }

  void run() // The method the thread will execute
  {
     while(!killSwitch) // Allows for "poison pill" shutdown
     {
       cachedData = executeQuery(sqlQueryToRefreshCache);
       wait(secondsToTimeOut);
     }
  }

  void kill()
  {
     killSwitch = true;
  }
}

To create a list, instantiate it with a poll time (secondsToTimeOut), an SQL query to run which will return the latest copy of the data (sqlQueryToRefresh), and the number of items you want in your list (itemsToKeepInCache, in your case, 40).

Then start up a thread which can execute the above (or a scheduled task, or a cron library task, whatever else you're using to manage timed events in your application,) and periodically your cache will refresh itself. If your system shuts down unexpectedly, then it will automatically rebuild itself from the database once the thread is restarted.

This is the basis of an incredibly simple cache. You can make it more complicated if you like, set it up as a singleton, add a "forceRefresh()" method to update the data outside the current refresh window, set it up to hold and refresh multiple caches on a single thread, or even go the whole hog and use a third party caching library.

Caching is the normal solution to this sort of problem though, and one which is normally easier to understand and maintain in the long term.

Erica
  • 2,251
  • 16
  • 21
0

i make the same assumption of @Erica, but provide a different solution:

also assumed that item-category relation is many-to-many.

import java.util.List;
import java.util.Map;
import java.util.TreeSet;
import javax.ejb.EJB;

@ManagedBean
@RequestScoped
public class ItemBean
{
    @EJB
    private DbService dbService;

    @ManagedProperty("#{categoryCache}")
    private CategoryCache cache;

    public void incrementViewCounter(Item item)
    {
        item.setViewCount(item.getViewCount() + 1);
        dbService.update(item);
        cache.update(item);
    }

    public void incrementLikeCounter(Item item)
    {
        item.setLikeCount(item.getViewCount() + 1);
        dbService.update(item);
        cache.update(item);
    }
}


@ManagedBean
@ApplicationScoped
class CategoryCache
{
    private Map<Integer, ItemSet> categoryMap;

    public void update(Item item)
    {
        ItemReference ref = new ItemReference(item);

        for(Category c : item.getCategoryList())
        {
            ItemSet set = categoryMap.get(c.getId());
            if(set == null)
            {
                set = new ItemSet();
                categoryMap.put(c.getId(), set);
            }

            set.add(ref);
        }
    }
}

class ItemSet extends TreeSet<ItemReference>
{
    private static final int MAX_ENTRIES = 40;

    @Override
    public boolean add(ItemReference ref)
    {
        if(contains(ref)) remove(ref);

        super.add(ref);

        if(size() > MAX_ENTRIES)
        {
            remove(last());
        }

        return true;
    }
}

class ItemReference implements Comparable<ItemReference>
{
    private final Integer id;
    private final Double rank;

    public ItemReference(Item item)
    {
        this.id = item.getId();
        this.rank = item.getViewCount().doubleValue() * 0.1 + item.getLikeCount().doubleValue();
    }

    @Override
    public int compareTo(ItemReference that)
    {
        return -this.getRank().compareTo(that.getRank());
    }

    @Override
    public int hashCode()
    {
        return id.hashCode();
    }

    @Override
    public boolean equals(Object that)
    {
        if(that instanceof ItemReference)
        {
            return this.getId().equals(((ItemReference)that).getId());
        }

        return false;
    }

    public Integer getId()
    {
        return id;
    }

    public Double getRank()
    {
        return rank;
    }
}
Michele Mariotti
  • 7,372
  • 5
  • 41
  • 73