0

I'm working in a clustering problem and I need to handle this situation :

    frids <- data_frame(
      name = c("Nicolas", "Thierry", "Bernard", "Jerome", "peter", "yassine", "karim"),
      age = c(27, 26, 30, 31, 31, 38, 39),
      height = c(180, 178, 190, 185, 187, 160, 158),
      married = c("M", "M", "N", "N", "N", "M", "M")
    )
    i <- Intervals(
      matrix(
        c(0,5000,  
          0,5000,
          7000,10000,  
          7000,10000,
          7000,10000,
          10000,15000,  
          10000,15000
        ),
        byrow = TRUE,
        ncol = 2
      ),
      closed = c( TRUE, TRUE ),
      type = "R"
    )

    frids$salaire = i

    frids

    data=frids
    # I obtained the matrix of numerical and interval type of attributes.
    t<-list.df.var.types(data)
    df.r<-as.matrix(data[c(t$numeric,t$Intervals)])

I have a data-matrix with both numerical and intervals attributes :

enter image description here

example : here 'salaire' is an interval that consists of lower-bound and upper-bound . thus 'salaire.1' is the lower-bound and 'salaire.2' is the upper-bound.

If I want the distance between df.r[1,] and df.r[3,] , I could define this distance as the sum of the cartesian distance with the hausdorff distance :

distance ( df.r[1,] ; df.r[3,] )=distance_num (df.r[1,] ; df.r[3,] ) + distance_hausdorff (df.r[1,] ; df.r[3,] )

distance_num : is the cartesian distance where we are only using "age" and "height" .

distance_hausdorff : is the hausdorff distance where we are only using the attributes of interval types such as ' salaire' .

My trial :

d[k]=sqrt(sum((df.r[1,][1:2]-df.r[1,][1:3])^2))+ ????

My question :

How can I implement in r the distance between those kind of mixed vectors ( with both numerical and interval attributes ) ?

Tou Mou
  • 1,270
  • 5
  • 16
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Pictures of data are not reproducible. – MrFlick Feb 11 '20 at 16:48
  • @MrFlick . I will add the r code of the data. – Tou Mou Feb 11 '20 at 16:49
  • mixed_distance<-function(x,y,n1,n2){ result=dist(rbind(x[1:n1],y[1:n1]))+dist(rbind(x[n1+1:n1+2*n2-1],y[n1+1:n1+2*n2-1])) return(result) } – Tou Mou Feb 11 '20 at 17:35
  • n1 is the number of numerical attribute and n2 the number of interval attribute. – Tou Mou Feb 11 '20 at 17:36

1 Answers1

0
    mixed_distance<-function(x,y,n1,n2){

      result=dist(rbind(x[1:n1],y[1:n1]))+dist(rbind(x[n1+1:n1+2*n2-1],y[n1+1:n1+2*n2-1]))

      return(result)
    }

# example : mixed_distance(df.r[1,],df.r[2,],2,1)

n1 is the number of numerical attributes.

n2 is the number of interval attributes.

x and y are mixed vectors.

Tou Mou
  • 1,270
  • 5
  • 16