0

I have a cross table of z-transformed data. The columns and rows consist of country codes and the data in the table is an index number indicating the mutual dependency on trade between the two countries, which has been transformed to z-scores for comparability reasons.

I now want to cluster the cross table into communities of countries that are significantly more dependent on trade within their cluster than outside of their cluster.

How can I achieve that clustering (in R)?

I have used kmeans algorithm with fairly reasonable results, but I'm not sure kmeans works on z-scores. If not, is it possible to transform z-scores into distances of any kind?

Glad for any help!

EDIT: The upper left corner of my data looks like this (differences are small, but they are there and sometimes bigger than in this example):

            AFG         AGO         ALB AND         ARE         ARG         ARM ATG
AFG          NA          NA          NA  NA          NA          NA          NA  NA
AGO          NA          NA          NA  NA          NA          NA          NA  NA
ALB -0.07627342 -0.07627342          NA  NA -0.07626487          NA -0.07627322  NA
AND          NA          NA          NA  NA          NA          NA          NA  NA
ARE -0.07608694          NA          NA  NA          NA          NA          NA  NA
ARG -0.07627337 -0.07595271 -0.07626095  NA -0.07564470          NA -0.07626129  NA
ARM -0.07627292 -0.07627342          NA  NA -0.07442803          NA          NA  NA
ATG          NA          NA -0.07627337  NA          NA -0.07627283          NA  NA
craszer
  • 121
  • 7
  • Please make this question *reproducible*. This includes sample code you've attempted (including listing non-base R packages, and any errors/warnings received), sample *unambiguous* data (e.g., `data.frame(x=...,y=...)` or the output from `dput(head(x))`), and intended output given that input. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. – r2evans Sep 28 '22 at 10:27
  • (1) For clustering, you may need to reshape your data, [wide-to-long](https://stackoverflow.com/q/2185252/3358272), perhaps something like `reshape2::melt(tibble::rownames_to_column(dat), "rowname")`. (2) Negative distances? I'd expect distance to be a positive scalar. Are you sure about the negative values? – r2evans Sep 28 '22 at 12:05
  • yes, you are right. I will make it so all numbers are positive by adding the minimum number in the dataset to all other numbers. Can the resulting numbers be viewed as euclidian distance? Basically asking: is the difference between the z-normalized scores of -2 and +2 a euclidian distance of 4? – craszer Sep 28 '22 at 16:42

0 Answers0