I'm trying to solve the edit distance problem. the code I've been using is below.
public static int minDistance(String word1, String word2) {
int len1 = word1.length();
int len2 = word2.length();
// len1+1, len2+1, because finally return dp[len1][len2]
int[][] dp = new int[len1 + 1][len2 + 1];
for (int i = 0; i <= len1; i++) {
dp[i][0] = i;
}
for (int j = 0; j <= len2; j++) {
dp[0][j] = j;
}
//iterate though, and check last char
for (int i = 0; i < len1; i++) {
char c1 = word1.charAt(i);
for (int j = 0; j < len2; j++) {
char c2 = word2.charAt(j);
//if last two chars equal
if (c1 == c2) {
//update dp value for +1 length
dp[i + 1][j + 1] = dp[i][j];
} else {
int replace = dp[i][j] + 1 ;
int insert = dp[i][j + 1] + 1 ;
int delete = dp[i + 1][j] + 1 ;
int min = replace > insert ? insert : replace;
min = delete > min ? min : delete;
dp[i + 1][j + 1] = min;
}
}
}
return dp[len1][len2];
}
It's a DP approach. The problem it since it use a 2D array we cant solve this problem using above method for large strings. Ex: String length > 100000.
So Is there anyway to modify this algorithm to overcome that difficulty ?
NOTE: The above code will accurately solve the Edit Distance problem for small strings. (which has length below 1000 or near)
As you can see in the code it uses a Java 2D Array "dp[][]" . So we can't initialize a 2D array for large rows and columns.
Ex : If i need to check 2 strings whose lengths are more than 100000
int[][] dp = new int[len1 + 1][len2 + 1];
the above will be
int[][] dp = new int[100000][100000];
So it will give a stackOverflow error.
So the above program only good for small length Strings. What I'm asking is , Is there any way to solve this problem for large strings(length > 100000) efficiently in java.