Invisible gremlins in translation!

Jan 30

Zero-width characters can induce madness in some text processing and translation environments.

5 Comments

Interesting. I was recently working with a pdf file (in a pdf reader, not a CAT tool) where the search function refused to find words that I could clearly see were present in the file. I'm guessing this could well have been caused by the same (or a similar) issue.

Expand full comment

Reply (1)

Kevin Lossner

Jan 30

Very likely. Copy one of those words and paste it into a code point converter page and count the characters listed. I'll bet they will be more than the ones you see.

Try this page, for example:

https://cryptii.com/pipes/text-decimal

Expand full comment

Reply (1)

Rob Grayson

Jan 30

Good idea. Thanks!

Expand full comment

James Kirchner

Jan 30

Many times I have run into something similar in German source texts that were not composed with Unicode fonts. You'd think everything would be Unicode now, but sometimes things still arrive in a pre-Unicode font.

In those texts, a word like "früher" will look fine, like just one word, but the programs will perceive it as something like "fru<¨>her". The programs will also not recognize it for glossary display, among other problems. Luckily, Word will select the whole thing as a misspelled word in spellcheck, so a quick spellcheck before import usually fixes the problem.

One time, I handed over a job and was told by the project manager, with the utmost urgency, that I hadn't finished it. It turned out that import into both MemoQ and Trados would stop at a certain point and the remaining text would not display. Trados wouldn't show me the text it was importing (these were text files and not Word files), so I couldn't see what was causing the glitch. However, MemoQ's text import dialog, where you can choose the text encoding shows you exactly what the text you're importing will look like. I scrolled down to the point where import stopped, and I found a strange Chinese-looking character that didn't belong there. I opened up the text file, deleted and rewrote the area where the invisible character was, and then everything imported fine. The client always wanted everything done in Trados, but if I hadn't been using MemoQ, I'd never have found the problem, and the project manager could have blamed me, even though she didn't know how to fix it yourself.

Expand full comment

Reply (1)

Kevin Lossner

Jan 30

memoQ gives me the tools to troubleshoot many things as you did in ways users of other tools can barely imagine. Simplicity on the surface; I can get people to be productive with far less teaching/learning time than other tools. But when greater depth is needed, it's there.

Expand full comment

Translation Tribulations Substack

Invisible gremlins in translation!