I'm trying to build a fuzzy set from a series of example values with python3
.
For instance, given [6, 7, 8, 9, 27]
I'd like to obtain a function that:
- returns
0.0
from 0 to 5ca, - goes gradually up to
1.0
from 5ca to 6, - stays at
1.0
from 6 to 9, - goes gradually down to
0.0
from 9 to 10ca, - stays at
0.0
from 10ca to 26ca, - goes gradually up to
1.0
from 26ca to 27, - goes gradually down to
0.0
from 27 to 28ca, - returns
0.0
from 28ca and afterwards.
Notice that the y
values are always in the range [0.0, 1.0]
and if a series is missing a value, the y
of that value is 0.0.
Please consider that in the most general case, the input values might be something like [9, 41, 20, 13 ,11, 12, 14, 40, 4, 4, 4, 3, 34, 22]
(values can always be sorted, but notice that in this series the value 4
is repeated 3 times therefore I'd expect to have a probability of 1 and all the other values a lower probability value -- not necessarily 1/3 as in this case).
The top part of this picture shows the desired function plotted up to x=16
(hand drawn). I'd be more than happy to obtain anything like it.
The bottom part of the picture shows some extra feature that would be nice to have but are not strictly mandatory:
- better smoothing than shown in my drawing (A),
- cumulative effect (B) provided that...
- the function never goes above 1 (C) and...
- the function never goes below 0 (D).
I've tried some approaches adapted from polyfit, bezier, gauss or others, for instance, but the results weren't what I expected.
I've also tried with package fuzzpy
but I couldn't make it work because of its dependency to epydoc
which seems not to be compatible with python3
. No luck as well with StatModels.
Can anyone suggest how to achieve the desired function? Thanks in advance.
If you wonder, I plan to use the resulting function to predict the likelihood of a given value; with respect to the fuzzy set described above, for instance, 4.0
returns 0.0, 6.5
returns 1.0 and 5.8
something like 0.85. Maybe there is another simpler way to do this?
This is how I usually process the input values (not sure if the part that adds the 0
s is needed), what show I have instead ???
to compute the desired f
?
def prepare(values, normalize=True):
max = 0
table = {}
for value in values:
table[value] = (table[value] if value in table else 0) + 1
if normalize and table[value] > max:
max = table[value]
if normalize:
for value in table:
table[value] /= float(max)
for value in range(sorted(table)[-1] + 2):
if value not in table:
table[value] = 0
x = sorted(table)
y = [table[value] for value in x]
return x, y
if __name__ == '__main__':
# get x and y vectors
x, y = prepare([9, 41, 20, 13, 11, 12, 14, 40, 4, 4, 4, 3, 34, 22], normalize=True)
# calculate fitting function
f = ???
# calculate new x's and y's
x_new = np.linspace(x[0], x[-1], 50)
y_new = f(x_new)
# plot the results
plt.plot(x, y, 'o', x_new, y_new)
plt.xlim([x[0] - 1, x[-1] + 1])
plt.show()
print("Done.")
A practical example, just to clarify the motivations for this...
The series of values might be the number of minutes after which persons give up standing in line in front of a kiosk... With such a model, we could try to predict how likely somebody will leave the queue by knowing how long has been waiting. The value read in this way can be then defuzzyfied, for instance, in happily waiting
[0.00, 0.33], just waiting
(0.33, 0.66] and about to leave
(0.66, 1.00]. In case of about to leave
that somebody could be engaged by something (and ad?) to convince him to stay.