I'm working in a clustering problem and I need to handle this situation :
frids <- data_frame(
name = c("Nicolas", "Thierry", "Bernard", "Jerome", "peter", "yassine", "karim"),
age = c(27, 26, 30, 31, 31, 38, 39),
height = c(180, 178, 190, 185, 187, 160, 158),
married = c("M", "M", "N", "N", "N", "M", "M")
)
i <- Intervals(
matrix(
c(0,5000,
0,5000,
7000,10000,
7000,10000,
7000,10000,
10000,15000,
10000,15000
),
byrow = TRUE,
ncol = 2
),
closed = c( TRUE, TRUE ),
type = "R"
)
frids$salaire = i
frids
data=frids
# I obtained the matrix of numerical and interval type of attributes.
t<-list.df.var.types(data)
df.r<-as.matrix(data[c(t$numeric,t$Intervals)])
I have a data-matrix with both numerical and intervals attributes :
example : here 'salaire' is an interval that consists of lower-bound and upper-bound . thus 'salaire.1' is the lower-bound and 'salaire.2' is the upper-bound.
If I want the distance between df.r[1,] and df.r[3,] , I could define this distance as the sum of the cartesian distance with the hausdorff distance :
distance ( df.r[1,] ; df.r[3,] )=distance_num (df.r[1,] ; df.r[3,] ) + distance_hausdorff (df.r[1,] ; df.r[3,] )
distance_num : is the cartesian distance where we are only using "age" and "height" .
distance_hausdorff : is the hausdorff distance where we are only using the attributes of interval types such as ' salaire' .
My trial :
d[k]=sqrt(sum((df.r[1,][1:2]-df.r[1,][1:3])^2))+ ????
My question :
How can I implement in r the distance between those kind of mixed vectors ( with both numerical and interval attributes ) ?