Regex recipes for translation (books reviewed)
Comprehensive practical teaching and a solutions cookbook.
Regular expressions are a somewhat controversial topic in the translation and localization scene. Anyone who has been professionally active hears how one should learn to write them (I disagree in most cases, though I do teach how to make and use them), but the curve to do so proves too steep for most, and even those who have some success are often frustrated. It doesn’t help that most teaching of regex is largely irrelevant for translators or fails to fit the scope of text patterns they typically encounter.
The Internet is full of information and tutorials for regular expressions, but most of this information is focused on programming, not on text-related issues which typically concern those who work with translations, and often the examples one finds are for dialects of regex not used in one’s translation environment. And even when someone familiar with translation issues teaches a course or creates some other kind of tutorial, the content is too often bog-level basic (the kind of thing one can read in CAT tool help pages in a few minutes) and is divorced from the kind of feature context information one needs for successful application. (By “feature context”, I mean whether it applies, for example in segmentation rules, tagging procedures, quality assurance checks, content filtering, find & replace operations, transformations for substitution, etc. The various contexts usually require knowledge beyond regex in order to use regular expressions effectively.)
Anthony Rudd is one of those rare teachers who “gets” regex for translation. His gift to the profession comprises two books, one for those who really want to dig in and learn to write practical expressions, solve problems and apply their knowledge in common CAT tools like memoQ, Trados Studio, Wordfast and OmegaT, and a second volume which, while giving good coverage of the basics, is distinguished by over 100 pages of plug-and-play solutions to common translation problems, serving as an invaluable quick reference for troubleshooting or a good foundation to build in for more complex text challenges.
His first book, Practical Usage of Regular Expressions: An Introduction to Regexes for Translators, was published in 2018, and for me was the key to understanding and applying “lookarounds” to some difficult auto-translation rules I was tasked to construct for transforming German legal citations into appropriate formats for English target texts. Anthony has a fine way of presenting all the basics and advanced concepts for regex in simple language relevant to translators’ work, emphasizing the importance of “good enough” regex and avoiding technical overkill. This is the work I would choose as a textbook were I to teach a longer course at university or a professional education venue for translators and localization specialists. I refer back to it often to refresh my mind on certain regex concepts, because of all the myriad references I have found for regular expressions in the past fifteen years, it’s the one most directly relevant to my work. Later sections in the book give specific guidance for common computer-assisted translation environments. It is available as a paperback or as a Kindle e-book edition; the author has also been known to sell PDF versions directly if asked.
Anthony’s second book, Comprehensive Regular Expression Recipes: A Practical Cookbook, was published in 2020; it also contains all the basics references and practical recommendations one might need, but its centerpiece is a huge collection of complete regexes written to solve most of the problems one might wish to handle in a translation project. In 110 pages of “cookbook” examples, the author provides an excellent foundation for creating personal regex libraries for one’s daily work. And where his examples might not completely cover a particular challenge, I find they almost always provide the right start for what is needed. I keep an e-book copy on my working computers, and when memoQ introduced its Regex Assistant library tool in version 9.9, I extended my personal collection of bread-and-butter working regexes to a large extent by copying and pasting Anthony’s examples straight from the e-book, including the descriptions of how to apply those expressions. This book is an excellent support for both regex novices and those who are considered to be advanced practitioners of that black art.
In future posts I will be discussing many different aspects of using regular expressions in translation and localization work, but I wanted to present these two excellent works first, because they have contributed the most to my personal (now 15-year) learning curve and they will be the best resources available for others in the language services profession who may try to follow and apply my examples but who find that my own explanations fall short of what they need. A few hours spent with either book will probably give you more than any webinar or short course in the subject matter can.
To augment Kevin's review and the mentioned books, I have also provided several regex-related documents in the Files section of the MemoQ Users Facebook group.