this is the first time I write in this site.
So I need to generate a set of random data with a function that returns an object. This object picks some properties (on really nested levels) randomly from other arrays of objects. So the function returns the same object in structure, but different values in its properties.
Is there a way to calculate a uniqueness ratio or something like that? Like if there's one generated object exactly equal to other in the set, it will return a uniqueness of 0, if there are no shared properties with any other, return a 100, and if some are shared, and others not, some percentage in between?
My goal with this is to generate a set of 100 for example and pick the top 20 most unique generated objects.
Thanks in advance for your ideas.
EDIT:
Let's assume I already generated the set of data. All objects have the same structure but different values. Something like this:
{
name: 'Some Name',
propA: (picked randomly from set A),
propB: (picked randomly from a different set B),
sections: [
{
propC: (another random from another set C)
},
{...},
...
]
}
I spawned an array of these objects with some utilities I wrote with ramda, like pick random from a list, and R.times to do it.
The main issue is that I need this:
{
...generatedObject,
uniqueness: 79
}
On each object, the uniqueness is a percentage.
So far I used deep-diff To get a difference between to objects and wrote a function to extract a percentage based on the number of props that were changed in the object.
This is that fn:
// changes is a Number
const measureUniquenessBetweenTwoChildObjects = R.curry((changes, objA, objB) =>
R.compose(
R.multiply(100),
R.divide(R.__, changes),
R.length,
diff)(objA, objB)
);
What this does is that if there is the same changes as there are generated props, then the difference is 100%.
Then I did pick every object in the list, and map this function with every other object except itself, reduce that array of differences with an average and that's what I thought the final number is. Then I attached that number to the object with R.assoc.
Inspecting the array of percentage differences gives me something like this:
[
73.02, 73.02, 72.79, 72.56,
72.56, 72.34, 72.34, 72.11,
71.66, 71.66, 71.2, 70.98,
70.98, 70.98, 70.75, 70.52,
70.29, 70.29, 70.07, 69.84
]
Each of these are the uniqueness ratio I attach to the objects.
However I think my solution is flawed, I sense something is odd here. This was my logic to solve this problem.
What I am asking you is how would you solve this? In the end the issue is to write an algorithm that calculates a uniqueness value of each object within a set of objects of the same structure, but different values.
I'm not asking for code, just some ideas to make this work in a proper way. I'm not a data scientist or a mathematician, so I went with my naive way of achieving this.
Hope this makes it more clear.
Thanks.