Fuzzy thinking in team translation

Internal fuzzies (homogeneity) must (almost) never be used in a team project. The problem of "just" calculation.

Jan 16, 2025

The author of this guest post, Gergely Vandor, is a versatile localization consultant who writes the Translation Technologist substack.

Meet the fuzzies

When translation companies pay their translators, they almost never pay based on the full word count multiplied by a word rate. Instead, the companies use methods to figure out how much actual work the translator needs to do, because standard translation tools can help the translator in various very significant ways. But sometimes the way translation companies calculate this “measured effort” can become quite unfair to translators. Very long ago, I also started off as a translator, then I became a translation tech guy, so I’m quite motivated and well positioned to fight injustice against translators. One thing that can be quite unfair is the so called “internal fuzzy matches” feature, also called “homogeneity” in some tools.

The topic is quite technical and requires understanding of how translation companies calculate “translation effort”. I’ll try my best to explain it in a way that even a rookie translator or even somebody outside the field of translation understands the problem. To be able to stand up and fight against a form of injustice, you need to understand it first and be able to argue against it.

So, the concept of “internal fuzzies” is this: in any text to translate, especially technical ones, there may be sentences (or “segments”) that are similar to each other. Let’s suppose you are translating a piece of text that contains two very similar sentences:

I will be travelling to Prague on Friday.
I will be travelling to Prague on Tuesday.

For now, let’s assume you are a freelance translator working alone on assignment, so you will translate both these sentences. This will become very important later. If you translate these sentences using any professional translation tool from the last few decades, then by the time you get to the second sentence, your tool will remember your translation for the first sentence. This feature is called the “translation memory”, or TM. Even though the second sentence is slightly different, when you are translating it, the tool will show you the translation you entered for the first one previously, and try to make you more efficient when translating the new, similar sentence. The tool will probably also highlight what exactly is different (the word Friday vs Tuesday in our case). Modern tools, when configured right, may even be able to automatically assemble the translation for you by substituting the changed word in the translation.

So, it is not surprising that the translation company gets the idea that the translator shouldn’t be paid for the full word count of the second sentence. The concept of “fuzzy matches” comes in here: the translation tool will come up with a percentage value saying the second source sentence was something like 80% similar to the first one, and the translator is paid less for the second sentence based on a so called “fuzzy matrix.” The first sentence is paid for in full, because they are 8 new words. The second sentence is also 8 words, but is an 80% match, so the translator may be paid just 70% of the full price, for example.

Let’s take a moment to clarify our terminology. If the TM already contains one of the sentences, then it will be a (normal) “fuzzy match”. If the TM doesn’t contain any of the similar sentences yet, and the two similar sentences exist in the text to be translated only, then we are talking about an “internal fuzzy match”.

Why internal fuzzies can be OK for a single translator, and unfair in any translation team

Applying a discount for internal fuzzies is not inherently all evil, at least in my opinion, especially not in technical translations where internal similarities like this are common. There needs to be a clear agreement between the translator and the translation company, where every detail of compensation is explained in detail. If the translator does not agree with something like “internal fuzzies”, they could try arguing or they could refuse to work for the translation company.

However, let’s imagine that we have these same two “internal fuzzies”, or two sentences that are very similar, but this time they are assigned to two different translators in the same project or batch. Just for clarity, let’s look at our example sentences again:

1. I will be travelling to Prague on Friday.

2. I will be travelling to Prague on Tuesday.

Both Translator A and Translator B receive an assignment that contains one of these two similar sentences somewhere. (This may look like an unlikely example, but technical translations are very often full of various patterns of repetitions and similarities.) Let’s also assume that the translation team works in a modern translation tool that has an online translation memory. Also let’s assume that when starting off, the translation memory does not contain anything similar to either of the two source sentences. It’s very clear what is going to happen: one of the translators (either A or B, we can never know) will get to their version of the sentence first, and they will translate it from scratch, without any assistance from the translation tool. And, sometime later, the other translator (maybe the one who generally tends to procrastinate further) will arrive at their version of the sentence. Now the TM contains a fuzzy match created by the (chronologically) first translator, and this second translator will have a much easier time, because the first one has translated an almost identical sentence, placing it in the TM. So, depending on sheer luck, the “second” translator works way less.

However, and this is a huge problem, the payment to the translators is calculated before they even start working. If the two documents are counted together, and the “internal fuzzies” (or homogeneity) feature is used, then the translation tool simply randomly calculates the full word count for one of the translators, and a discounted word count for the other translator. Since there is no way to know who translates their version of the “internal fuzzy” sentence first, it’s also impossible to apply the “discount” correctly.

A possible workaround could be to calculate internal fuzzies only within each translator’s assigned sentences. Another workaround could be to make sure that documents are assigned in a way that internal fuzzies are always assigned to the same person. But, to my knowledge, current tools don’t automatically take these steps to protect the translator’s interest. The project managers responsible for project preparation must be very careful to prevent the injustice. I also hope that most of the translation companies that use “internal fuzzies” in unjust ways do so not out of malice but out of lack of full understanding. I’ve always felt compelled to try and fight that.

Visit the Translation Technologist!