Translating Hugo based websites with Gettext

In the Linux world, gettext is the gold standard for translating content. It’s powerful; there is a significant amount of tooling around it: there are editors like Lokalize, poedit, weblate and many others, and also libraries and bindings for many languages. But in the web development world, a unified internalization solution isn’t a solved problem yet. Django uses gettext; many js frameworks are using JSON as a key-value store of strings, but other formats exist and sometimes some frameworks provide nothing and everything needs to be done from scratch.

Unlike Jekyll, Hugo provides some built-in internalization support. This includes the i18n function for translating templates, translatable menus and a way to translate markdown files by adding a translated copies next to the original English file. Unfortunately, this is not enough. There is no way to automatically notify the translators when and how a markdown file changed since a page sent to the translators is the raw markdown file. The second problem is that the translations need to be extracted and injected in three different places and various formats. Hugo uses markdown files for the content, a YAML file for the strings in the HTML templates and a YAML config file for the menu and site metadata translations (e.g. site title). A third problem is that none of these formats are directly usable for the KDE translation system and KDE translators that expect po files to work with their usual tools and workflow.

Bridging everything

The solution was to build a bridge between these two worlds. So I created a python script especially for kde.org using polib for manipulating the translation files. The python script can extract the translations from the English content and automatically create the required markdown and YAML files to generate the translated Hugo website. The easy part was to handle the YAML files. It’s just extracting specific values inside of them. The markdown handling was a bit more tricky. We obliviously don’t want to send one big string with the entire content of the markdown file to the translators. Instead, we want to split it as best as possible. Each paragraph is extracted as a separate string, same with each list item and the script even tries to extract Hugo shortcodes correctly. For example, this is how the following text is extracted:

Hello

* List item 1

{{/*< img src="..." alt="Accessibily description" title="My image title" >*/}}

{{/*< empty >*/}}

becomes

#. type: Plain Text
#: content/myfile.md:1
msgid "Hello"
msgstr ""

#. type: List item
#: content/myfile.md:3
msgid "List item 1"
msgstr ""

#. type: Plain Text
#: content/myfile.md:5
msgid "Accessibily description"
msgstr ""

#. type: Plain Text
#: content/myfile.md:5
msgid "My image title"
msgstr ""

This is probably the best result that can be reasonably archived. It doesn’t support every markdown feature and especially not inline HTML, but we don’t need them in KDE yet and usually, I prefer to separate the content in markdown from the HTML layout. This makes it easier to update the theme or other aspects of the website in the future.

The script is runing on kde.org for almost a year. Maturing along the way to serve the needs of around 400 pages available for translations in KDE.org. But KDE.org is not the only KDE website and the next step was to make it possible to translate more websites. At first, this was archived by copy-pasting the scripts across a few repository, but this is obliviously not something that scales and generally not a good practice.

So Phu Nguyen and I (but mostly Phu Nguyen) worked on making the scripts a proper python package with configuration options and documentation. The primary goal was to make it easier to reuse it for the other KDE websites, but a nice side effect is that this also makes it easier for everyone to use it. You can find the source code and documentation of hugo-i18n in invent.

Phu and I ported a few websites already. This includes planet.kde.org, kate-editor.org, timeline.kde.org, elisa.kde.org, okular.kde.org and apps.kde.org, but help is always welcome to help us porting more of them, and if you are interested, can join our Matrix channel or IRC channel #kde-www.

Why all of the trouble? Isn’t the English version enough?

English isn’t spoken in large parts of the world and translating websites (and our software) help us reach more people that wouldn’t necessarily be able to use our software or learn about it if it was only available in English. In short, this brings KDE a step forward konquering the world. According to our web statistics, more than 10% of the visitors of kde.org are browsing the translated versions and it is slowly growing.

It is also an excellent occasion to remind people that translating KDE software, website and documentation is a great way to get involved inside the KDE community even if you can’t program. So get involved!

Comments

With an account on the Fediverse or Mastodon, you can respond to this post. Since Mastodon is decentralized, you can use your existing account hosted by another Mastodon server or compatible platform if you don't have an account on this one. Known non-private replies are displayed below.

Learn how this is implemented here.