0

I am trying to implement the accepted answer from this question for ID generation and using XML files for storage of my content and for the content IDs table.

The idea is each content item would be stored (serialized) as my-content-item-slug-374871.xml, where the number is the random ID that the content item will be given from the IDs table (from the ones that are not yet taken). My requirement is that the ID is a six digit number (display requirements) between 100000 and 999999, so effectively we will only be able to create 899999 content items but that should be enough. If you wonder why such a requirement, I can only say that I don't want IDs starting from zero and I don't want IDs such as GUIDs (which would be way easier to create and maintain, I know) because ID will be used in MVC routes (much like the SO's URLs).

So for starters I decided to create a Dictionary, where key is the ID and value determines whether it is used or not (true if used, false if available). I then serialize this object into XML file using DataContractSerializer.

The file is 72MB long and here I think the problems start to appear. First of all, I just tried to open this file in VS2010, Notepad, Wordpad and IE and they all crashed and memory consumption went skyrocket. But the application seems to have no problems with it. Still I think this will be huge memory and CPU hog and performance will suffer.

Am I right in my assumptiosn and if so, what are my other options?

Community
  • 1
  • 1
mare
  • 13,033
  • 24
  • 102
  • 191

3 Answers3

1

I would suggest the same as Henk (just use sequential, seeded IDs), however you can accomplish what you're looking for:

Rather than creating a dictionary with all possible values, a GenericList with only values that have been used would be less intensive:

static class Static
{
    static List<int> UsedIds = new List<int>();
}

Then loop until you find one that hasn't been used yet. (Randoms are probably not the best choice unless you seed them independently of the clock).

int GetNewId()
{
    Random rand = new Random();
    while (true)
    {
        int newId = rand.Next(100000, 999999);
        if (!Static.UsedIDs.Contains(newId))
        {
            Static.UsedIDs.Add(newId);
            return newId;
        }
    }
}

This should be more efficient in the short-term but for long-term performance and scalability, I would strongly suggest the use of seeded identities or GUIDs - which are fairly usable when Base-64 encoded (similar to YouTube URLs).

lukiffer
  • 11,025
  • 8
  • 46
  • 70
0

for starters I decided to create a Dictionary,

You will find that a BitArray takes up far less space.

But the basic question is: why 'random' ?

If you need unique ID's, just use a counter. Start it at 100000 and increment each time you use one.

H H
  • 263,252
  • 30
  • 330
  • 514
0

Instead of maintaining a list of used numbers, just create the new file name and do a File.Exists(fileName) call, if it doesn't exist it isn't used.

Edit: Sorry, presumed the language was C#, but the idea should be similar to other languages.

Chuck Savage
  • 11,775
  • 6
  • 49
  • 69