Recently on LinkedIn, a colleague published an article about a rather clever Python script he wrote for extracting original source files from a Trados Studio SDLXLIFF. I shared the link to it on Bluesky, but then when I thought about testing it, I really was not in the mood to install Python on my laptop. I’ve been trying to get away from all that programming bullshit for 22 years now, and I have to draw the line somewhere, some time. Moreover, that kind of thing is just too nerdy for most of my clever linguist friends who have actual lives.
So I figured there had to be a simple way that your average person who fucking hates technononsense can deal with. And that’s almost the case.
In an RWS/SDL forum I found an article describing steps that most people can manage.
The only preparation step that needs to be taken is to download and install the free Open Source code editor Notepad++. When I tested the procedure, it worked with no changes to the installation defaults. I recommend that tool frequently for a lot of other things, such as proper maintenance of memoQ auto-translation rulesets, so some of you may already have it. In any case, the download and installation take only a few minutes. And surely other tools can accomplish the same transformation described here.
The video in this post shows my very first attempt at extracting an original source file for translation that was processed in SDL Trados Studio to make that platform’s flavor of XLIFF, an “SDLXLIFF” file.
These SDLXLIFF files contain an embedded, encoded ZIP file with the original document to translate inserted in the <internal-file form="base64">
tag set near the top of the file.
Copy that text, paste it into a new document in Notepad++ and then convert the encoded information back to its original form for the ZIP file by choosing the menu option Plugins > MIME Tools > Base64 Decode
. Then save the file with a *.zip
extension (choose the file type to save as All files (*.*)
then type the extension “.zip” onto the name of the ZIP file you save.
Then locate your file where you saved it, and unpack the ZIP file. The name of the extracted source file won’t be the original name (unlike with that cool Python script linked above), but you can copy the original file name off the name of the SDLXLIFF file.
If you were given a Trados Studio package project (SDLPPX), these contain the original source files already. Packages are just ZIP files, so you can unpack those and get at what you need as described in another of my posts.
I have lost count of the many occasions past when I was sent prepared SDLXLIFF files (rather than packages) to translate, and I had to request the client to send the source files as well so I could see the text to translate in its proper context. Had I known this tip back then, it would have saved me a lot of time and frustration. And until yesterday, I had never heard of this possibility. Truly, a well-kept secret.
Corrections and addenda:
The original post mentioned Trados Studion “SDLRPX” package projects when I should have written “SDLPPX”. That’s what happens when one writes past bedtime. SDLRPX files are the return packages generates from a project created from an SDLPPX package.
There was a discussion of this article on the now politically compliant Trados users forum on groups.io; here is the most interesting part of it:
It’s interesting information perhaps for someone creating the SDLXLIFF files, but for the one receiving it, well, it explains an error message that TradoZe users experience. Poor Trados users have a life plagued by many error messages, but so it goes, as Kurt Vonnegut used to say. The rest of it is largely irrelevant, and I’m inclined to say that the poster can kiss my “cannon”. Someone for whom the sun rises and sets on Trados apparently has difficulty understanding that many people who work on SDLXLIFF files in fact do not have Trados Studio licenses and therefore cannot use the Trados Studio menu function “Save Source As…
”. And according to the RWS forum article I linked above (from which I gleaned the method described in this post text and recorded in the video), the files obtained by using “Save Source As…
” are not exactly the same. This is, I believe, true of any source files regenerated from any CAT tools. In the same way that files produced by re-saving from original applications such as MS Word may differ slightly and in some cases correct small errors. (I have fixed corrupted source files this way sometimes.)
Someone who has a Trados Studio license will read this post and say “so what?” — I probably would myself. I’ve used that “Save Source As…
” function many times because I do in fact own a license. Or maybe two of them. But many users of tools like memoQ, Cafetran, Wordfast, etc. don’t, and the method presented is intended to help them.
Share this post