Understanding word match levels in Smartcats CAT tool
Learn the differences between word match levels in Smartcat's CAT tool, Smartcat Editor
It can sometimes be confusing differentiating between the different level of word matches when using a computer-assisted-translation (CAT) tool like Smartcat Editor.
In this article, let's look at these differences, explaining 100% matches compared to 101% and 102% matches, and also fuzzies and near matches.
100% and 101%/102%. What's the difference?
101/102% matches are called by different names, depending on the CAT tool. also referred to as context matches, perfect matches or ICE matches.
When a segment is stored in a Smartcat translation memory, Smartcat stores not only the source and target text, but also the content of the previous and following source segment.
For example, this is what you might see in the TM.
<Previous Segment> I live in a small village.
<Source Segment> I have a small house. <Translated Segment> J'ai une petite maison.
<Following Segment> It is blue.
The translation is stored only for the segment that is being translated, but the other two segments are used to provide context.
100% matches explained
If this same segment was encountered again, and neither of the two accompanying segments matched the TM-store, there would be a 100% match because only the text matches.
101% matches explained
If in the next document, one of the context sentences was present, there would be a 101% match.
102% matches explained
If both were present, it would be a 102% match. Having the context sentences match what is stored in the TM helps increase the certainty that the translation is a perfect match for new segment.
In practice, 101/102% matches are often locked during pre-translation by project managers when the project is started because customers don't pay for these segments in most cases.
Explaining nearly exact and fuzzy matches in Smartcat
Let's look at the difference between nearly exact and fuzzy matches, as well as the different tiers of fuzzy matches.
Nearly exact match (95%-99%)
The source text in the segment is identical to the match, albeit with minor discrepancies in numbers, tags, punctuation marks, or spacing. In pre-translation, this represents a good match by default, though it can be customized.
Fuzzy match (50%-94%)
The source text closely resembles the source text in the match, yet some variations already exist in the text. When it comes to the required editing, three categories of fuzzy matches can be identified.
High fuzzy (85-95%): For segments of average length or longer (typically 8-10 words or more), there is usually a discrepancy of just one word.
Medium fuzzy (75-84%): In segments of average length or longer (8-10 words or more), typically there is a variation of two words.
Low fuzzy (50%-74%): In segments of average length or longer (8-10 words or more), the difference encompasses more than two words. In pre-translation, the term "any match" encompasses all types of partial matches together, commencing at 50% by default, though this can also be adjusted to suit preferences.