Cosine distance is a good choice if whatever your float variables represent can be considered as directions in a vector space (i.e., magnitude does not matter).
Levenshtein distance (string edit distance) may be a good choice for your string columns (although of course this does not capture anything about semantics). However, pairwise Levenshtein distance doesn't appear to be available from any of the usual packages (also note this answer did not work for me), but you can obtain them with a few lines of code as shown below.
Combining different distance metrics can be done in many ways depending on your data and what you are trying to do. In the code below I use a simple summing method to combine a cosine distance matrix and multiple Levenshtein distance matrices. Notice that I normalize the Levenshtein distances by the maximum edit distances so they are in [0, 1]. This ensures they are weighted similarly to cosine distances in the final result. However, it's not perfect because cosine distances are in [0, 2]. Also, for multiple string columns, I am computing the pairwise Levenshtein distance on each column separately.
import numpy as np
import pandas as pd
from leven import levenshtein
from itertools import combinations
from sklearn.metrics import pairwise_distances
df = pd.DataFrame(
{"float1": [1, 2, 0, 2], "float2": [0,1,1,1], "string1": ["cat", "cattle", "dog", "drat"], "string2":["a","aa","aba","abc"]}
)
# Pairwise levenshtein distances (normalized by max distance)
N = len(df)
out = np.zeros((N, N))
inds = np.triu_indices(N,k=1)
string_cols = df.columns[df.iloc[0].apply(lambda x: isinstance(x,str))].values
for col in string_cols:
lev_dists = [levenshtein(i,j) for (i,j) in combinations(df[col],2)]
lev_dists /= np.max(lev_dists)
out[inds] += lev_dists
# Pairwise cosine distances for vector columns (0-1 by definition)
cos_dists = pairwise_distances(df[['float1','float2']],metric='cosine')
# Fill in the output matrix
out[inds] += cos_dists[inds]
out.round(2)
Returns:
array([[0. , 1.11, 2.5 , 1.44],
[0. , 0. , 2.05, 1.83],
[0. , 0. , 0. , 1.55],
[0. , 0. , 0. , 0. ]])