I am attempting to model a realistic social network (Facebook). I am a Computer Science Graduate student so I have a grasp on basic data structures and algorithms.
The Idea:
I began this project in java. My idea is to create multiple Areas of Users. Each User in a given area will have a random number of friends with a normal distribution around a given mean. Each User will have a large percentage or cluster of "Friends" from the Area that they belong to. The remainder of their "Friends" will be smaller clusters from a few different random Areas.
Initial Structure
I wanted to create an ArrayList of areas
ArrayList<Area> areas
With each Area holding an ArrayList of Users
ArrayList<User> users
And each User holding an ArrayList of "Friends"
ArrayList<User> friends
From there I can go through each Area, and each User in that Area and give that user most of their friends from that Area, as well as a few friends from a few random Areas. This is easy enough as long as my data set remains small.
The problem:
When I try to create large data sets, I get an OutOfMemoryError due to no more memory in the heap. I now realize that this way of doing it will be impossible if I want to create, say, 30 Area's with 1 millions users per area, and 200 friends per User. I eat up almost 2gb with 1 Area...So now what. My algorithm would work if I could create all the users ahead of time, then simply "give" friends to each user. But I need the Areas and Users created first. There needs to be a User in an Area before it can be made a "friend".
Next Step:
I like my algorithm, it is simple and easy to understand. What I need is a better way to store this data, since it cant be stored and held in memory all at once. I am going to need to not only access the Area a user belongs too, but also a few random areas as well, for each user.
My Questions:
1. What technology/data structure should I be putting this data into. In the end I basically want a User->Friends relationship. The "Area" idea is a way to make this relationship realistic.
2. Should I be using a different language all together. I know that technologies such as Lucene, Hadoop, etc. were created with Java, and are used for large amounts of data...But I have never used them and would like some guidance before I dive into something new.
3. Where should I begin? Obviously I cannot use only java with the data in memory. But I also need to create these Areas of Users before I can give a User a list of Friends.
Sorry for the semi-long read, but I wanted to lay out exactly where I am so you could guide me in the right direction. Thank you to everyone that took the time to read/help me with this topic.