tag:blogger.com,1999:blog-1800518423097919889.post7081468001749343711..comments2023-11-29T10:14:51.879+00:00Comments on Excellerando: String-Comparison in VBA: a modified Longest-Common-String approachNigel Heffernanhttp://www.blogger.com/profile/08954578765691578714noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-1800518423097919889.post-90321687582679108682014-09-17T16:48:22.222+01:002014-09-17T16:48:22.222+01:00Thanks Fang, I'll have a look at that.Thanks Fang, I'll have a look at that.Nigel Heffernanhttps://www.blogger.com/profile/08954578765691578714noreply@blogger.comtag:blogger.com,1999:blog-1800518423097919889.post-20239153662214322102014-09-16T20:35:07.701+01:002014-09-16T20:35:07.701+01:00Just wanted to let you know that there's a fai...Just wanted to let you know that there's a fairly serious bug in this code. Line 81 in the section that checks for the special case where s1 is a substring of s2 reads<br /><br />SumOfCommonStrings = n<br /><br />it should read<br /><br />SumOfCommonStrings = iScore + n<br /><br />Otherwise when the function is called recursively, it can drop the original iScore. Eg "eoakey" and "edwardoakey" returns 1 instead of 6, because when the recursive call on "e" and "edward" completes, it sets the iScore to 1 rather than adding the 1 to the 5 from the match on "oakey".<br /><br />I got the code from StackOverflow, so if you could update it there as well, that'd be great.<br /><br />Bug notwithstanding, this was very helpful. Thanks.<br /><br />PS. Sorry if this is a repeat, I didn't get confirmation that my post sent the first time.Fanghttps://www.blogger.com/profile/04369060559346907007noreply@blogger.comtag:blogger.com,1999:blog-1800518423097919889.post-48842400949095636302014-05-13T14:34:30.116+01:002014-05-13T14:34:30.116+01:00You're moving towards Levenshtein edit differe...You're moving towards Levenshtein edit difference with this approach:<br /><br />http://en.wikipedia.org/wiki/Levenshtein_distance<br /><br />There's nothing wrong with that - indeed, edit distance algorithms are the 'gold standard' for a rigorously-derived measure of difference between two strings - but it's very slow in VBA; and the whole point of the 'longest common string' approach is to use VBA for the role it performs well: a presentation layer, optimised for the advantages of a rich user interface with a trede-off of low performance, that achieves high performance by calling and controlling fast functions implemented in other languages.Nigel Heffernanhttps://www.blogger.com/profile/08954578765691578714noreply@blogger.comtag:blogger.com,1999:blog-1800518423097919889.post-18711073665774193112011-02-15T22:45:11.136+00:002011-02-15T22:45:11.136+00:00I did something like this at my previous job. I be...I did something like this at my previous job. I believe I just took one string and compared it to another. I would score it by how close was the closest character compared to the original string. If the character was in the same spot then it would receive a score of 1/1, if not it would receive 1/(number of displacement). Then I would take the total length of the other string and divide by the the length into the total added score of the comparison. Hope that makes sense. I would then compare all the scores with all the other strings I was comparing with and if the highest score was over a certain percentage I would do a manual check with a message box to decide if they were the same or not. Made my boring job much more exciting do that code.Jonhttps://www.blogger.com/profile/05518762624199557168noreply@blogger.com