A practical approach to website internationalisation


There are an abundance of websites that allow you to select a language to display pages in.
However there are very few guides on how to go about achieving this capability on your own website. Of course, if you are using an existing framework (eg Joomla! or Zend) then you will already have a structure to use.
If you do not use third-party frameworks, and want to integrate a simple lightweight process, then you might be able to use our approach.
This guide shows you how we have achieved this on one of our product websites.
This approach works for our needs. Of course your own website might need more capability. Or you may require less sophistaction. In either case, the approach takes account of the steps in the process regardless of complexity. Make use of the ideas as you need.

The tools we use

Our solution is tailored to use our choice of tools.

These are the only components that we are relying on.
How we use these tools will be described in more detail as we go through the process.

The website structure

We have a simple heirarchy of directories containing the content.
Our top level navigation categories map directly onto these directories.
So in our navigation links mysite.com/welcome maps to the /welcome physical directory on the server.

Language specific URLs

We were inspired by this arcticle.
We like the second option "Modified Directory Structure" (as far as what the URL looks like) but didnt want to suffer the downside of multiple physical representations.
So we would our URLs to look like www.training-mate.com/fr/welcome for the French version of the site. Of course we also want www.training-mate.com/welcome to work as well, and use our default language choice.
Practically, we can't use the URLs in the form above. We could replicate our directory structure for every language we support, but that would be unweildy.
From the article, it is option 3 "Language code in Querystring" that fits our purpose best.
So www.training-mate.com/fr/welcome/index.php should be transformed into www.training-mate.com/welcome/index.php?lang=fr
Now the URL maps properly onto our server directory structure, and the language choice is preserved, but our URLs still look nice.

Deciding what forms of language codes to support

The transformation of our URLs needs to understand the forms of language directives that are valid.
Conventionally, the language choice can be encoded as lang_country_variant, where lang is a 2 or 3 lower case letter langauge code, country is a 2 letter upper case country code, and variant is a 3 to 8 alphabetic code.
For our purposes, we have decided to support lang_country forms of the language choice.

Rewriting URLs with Apache mod_rewrite

What we can do then, is use the Apache mod_rewrite module to transform the incoming URL form into a useable one.
mod_rewrite works by matching the incoming URL against a series of rules. If a rule matches, then the transformation is applied.
Taking the simplest case first, /fr/welcome/index.php requires that we match 2 lower case letters followed by a forward slash at the beginning of the URL (the host part of the URL does not get used in the rewriting).

  RewriteRule ^/([a-z][a-z])/(.*)$ /$2?lang=$1 [QSA]

The rewrite rules use a standard regular expression syntax. The enclosing () allows the pattern match to be stored (so $1 is the result of the match against "fr" and $2 stores welcome/index.php). The result of the rule is to create /welcome/index.php?lang=fr
The [QSA] flag at the end instructs the rewrite engine to append any query string from the original URL onto the new.
The complete set of rules for the forms we have decided to support are

Extracting and storing the language code

Since we are dynamically generating our web pages, and using PHP to do so, getting the language from the URL is a simple matter of asking the query string for it:

Changing our navigation links

The next step of our transformation is to retain the current langauge code in the links that navigate through the site.
This is simply a case of prepending the value of $lang onto our links:

NB recall we are mapping onto our physical structure with out top level navigation links, and we clearly need to apply this approach to all of our internal linkage.
Now whenever the user clicks on an internal link, it will pass through the mechanisms shown above, and retain the users' current language choice.

Changing the page content

The bulk of our internationalisation work will be in dealing with the text of the pages on the site.
Now that we have the required language code, we need to do some language specific text output.
Before we do that, we need a way of storing and accessing our translations.

Storing our translations

For this, we take our inspiration from Java ResourceBundles and build in some PHP capability that mimics what happens in Java.
Resource bundles work by interpretting a sequence of property files in a hierarchical manner. Property files are simply a sequence of key=value entries, one per line, in a text file.
We can store our property files anywhere, in our case, we have set aside a /lang directory on the web server for them.
We start with our default property file, site.properties, which will contain the text for (some of) our website in our default language (in this case British English):

To add French support, we add site_fr.properties:

Some things to note are:

The heirarchy of these files is driven by the form of the language codes that we discussed above.
So if we set our language code to "fr_CA" (Canadian French), we would first ask site_fr_CA.properties for a translation. If it doesnt have one, we would then ask site_fr.properties for it. If we still don't have one, we fallback to asking site.properties for it.
With this scheme, our more specific language variants only need to provide very specific translations, whilst keeping the bulk of the translations in the parent property file.

Loading and using the translation we need

We obviously need a mechanism to assist with the loading of our property files.
The key=value pairing from our property files matches nicely with PHP associate arrays.
Therefore we need a simple class that can

So we can define a ResourceBundle class as follows:

and create a bundle in our script:

  $bundle = new ResourceBundle('/lang', 'site', $lang);

Now that we have our bundle, we can ask for the translations for the navigation links.
Recall that previously we had:

  echo('<a href="'.$langPrefix.'/'.$sect.'">'.$sect.'</a>');

now we simply need to add 1 line of additional code:

  $sectTrans = $bundle->gettext("nav_".$sect);
  echo('<a href="'.$langPrefix.'/'.$sect.'">'.$sectTrans.'</a>');

And thats all there is to it! We simply change our page generation from using static text in our default language to gettext($key) calls from the appropriate bundle.

Providing a Change Language mechanism

Changing the language is a matter of providing a link to the current page with the new language choice.
If you are only supporting a few languages, you can do this with static links somwhere on the page.
If you are going to support any number of languages, a form with a drop down choice is a better solution.
Either way, you will need to manipulate the current URL to strip off the current language prefix, and replace it with the new language code.
Continuing our approach of externalising information, we can use a property file to store our supported langauge choices:

Note that we are using the key to record the langauge and country code, and the value for the langauge in its own langauge.
Also note that we would not expect to have language specific versions of this information.
Here is the script that pulls all of this together:

Dealing with layout differences

Any layout that uses a fixed size for a text component may come undone when dealing with multiple languages.
Since we are using CSS to layout our pages, we can make use of the "last rule found is used" approach to overload layout specifications.
To achieve this, we need a default layout that works for the majority of our language selections.
Then we can load up a language specific style sheet to provide the language specific settings.
Thus we load our stylesheets in this way:

NB if a specified style sheet is requested but not available the request is just ignored. So if the baseline style works for a given language, you don't need to create the override.
In our styles we have a default layout for our navigation links:

but we find that for French, 90px is not wide enough, so in style_fr.css we overload with:

whilst retaining all of the other characteristics of the style.