Dynamic Programming | LCS | Practise 1
In this article we will talk about Longest Common Sunsequence.
Given next problem:
Given two words word1 and word2, find the minimum number of steps required to make word1 and word2 the same, where in each step you can delete one character in either string.
1 | Example |
Intuitively we get recursive solution. Lets define next function:
1 | int lcs(String s1, String s2, int n1, int n2) |
Function returns the common longest subsequence for strings s1 and s2 with corresponding length n1 and n2.
- Base case would be when n1 or n2 equals to zero, in this case longest common subsequence of empty and not empty strings would be empty string. Return value would be
0
. - There are two cases exists:
s1.charAt(n1 - 1) == s2.charAt(n2 - 1).
In this case we know that given strings of size n1 and n2 has two same chars at positions n1 - 1 and n2 - 1. And all we need to do is to find common subseqence of strings n1 - 1, n2 - 1. Return value would be1 + lcs(s1, s2, n1 - 1, n2 - 1)
s1.charAt(n1 - 1) != s2.charAt(n2 - 1).
In this case we can conlcude that common subsequence could end at index n1 - 1 or n2 - 1. That is why we need to check both cases and select maximum common subsequence. Return value would beMath.min(lcs(s1,s2,n1,n2-1), lcs(s2,s2,n1-1,n2);
Lets combine this into one function:
1 | public static int lcs(String s1, String s2, int n1, int n2) { |
Keep in mind: we need substract double length of lcs from sum of length of strings. Because the statement is to find minimum number of chars to delete to get equal strings.
1 | package com.company; |
Let’s optimize algorithm. Recursion calls itself again and again, and instead of getting values from recursion we can cache answers and use it later. Cache would be simple two dimensional array of size n1 + 1, n2 + 1, that holds Integers. If cache[i][j] != null
then cache holds precomputed value, otherwise compute and update cache.
1 | public static int minDelete(String s1, String s2) { |
We can break algorithm if we provide big strings that gives overflow of stack. It has place to make iterative solution. The idea remains the same. If current chars i, j
are same we have got new longest subsequence at path cache[i][j]
, if not then the answer would be the biggest value between cache[i][j-1]
and cache[i-1][j]
(don’t forget to change Integer[][]cache -> int[][]cache
).
1 | public static int lcs(int [][] cache, String s1, String s2, int n1, int n2) { |