Localization system in Ryzom
Localization system in Ryzom
Overview
There are mainly two distinct parts in localization for Ryzom. The first part (and the easiest) concerns the static localization on the client side (eg interface names, error messages). The second part is for dynamically generated text from servers.
As you can see in the diagram, there are four kind of file that makes the localization system to work. Each of this file must come in each localized language. In bold, you can see that each file contains the language code in its name.
File formats are discussed below.
Language code
Language in Ryzom are identified by there language code as defined in ISO 639-1 plus a country code defined in ISO 3166 if necessary.
ISO 639-1 is a two character language code (e.g. ‘en’, ‘fr’). This is enough for most of the language we want to support.
But there is some exception, like Chinese written language.
Chinese can be written in two forms: traditional or simplified. Nonetheless, there is only one language code for Chinese: ‘hz’.
So, we must append a country code to indicate witch form of written Chinese we discuss. The language code for simplified Chinese become ‘hz-CN’ (i.e. Chinese language, Chinese country), and for traditional Chinese, it is ‘hz’ only because all other Chinese speaking country (Taiwan, Hong Kong, ? ) use the traditional Chinese.
Identifier definition
Translated strings are associated to identifier. Identifiers are textual string that must follow the C identifier constraint with a little difference.
A C identifier must consist only of the following caracteres: ‘A-Z’, ‘a-z’, ‘0-9’, ‘@‘ and ‘_’. Real C identifier can’t start with a number, string identifier can.
Some good identifier:
This_is_a_good_identifier
ThisIsAGoodIdentifier
_This@is@notherGoodId
1234_is_a_goodId
This_Is_Good_1234
Some bad identifier:
This is a bad identifier
é#()|{[_IdBAD
File formats
There are three different translation file formats. But only two need to be learned ;-)
Format 1
This format is used for client side static text and for server side clause text.
The file is a list of identifiant to string association (also called value string). Identifiant must conform to C identifier constraint and value string is delimited by ‘[‘ and ‘]’.
Text layout is free; you can jump line and indent as you want.
identifiant1 [textual value]
identifiant2 [other textual value]
This file can contain C style comments.
// This is a single line comment. Continue until end of line
identifiant1 [textual value]
/* This is
a multiline
comment */
identifiant2 /* multiline comment here ! */ [other textual value]
Textual value can be formated for readability. New line and tab are removed in the final string value.
identifiant1 [textual
value
with
new line
and tab formating only for readability]
identifiant2 [other textual value]
If you need to specify new lines or tabulations in the value string, you must use C style escape sequence ‘\t’ for tab and ‘\n’ for new line. To write a ‘\’ in the string value, double the backslash: ‘\\’. To write a ‘]’ int the string, escape it with a backslash: ‘\]’.
identifiant1 [tabulation: \tThis text is tabbed]
identifiant2 [New line \nText on next line]
identifiant3 [Backslash: \\]
identifiant4 [a closing square bracket: \] ]
You can split the original file in multiple small file, more easy to maintain and work with.
This feature is achieved by using a C like preprocessor command “#include”.
#include "path/filename.txt"
You can have any number of include command. Included files can also contains include commands.
The path can be either an absolute path or a path relative to the location of the master file.
Format 2
This format is used for phrases translation files.
This format is a pretty complex grammar that will be described in a near LALR syntax:
identifiant : [A-Za-z0-9_@]+
phrase : identifiant ‘(‘ parameterList ‘)’
‘{‘
clauseList
‘}’
parameterList : parameterList ‘,’ parameterDesc
| parameterDesc
parameterDesc : parameterType parameterName
parameterName : identifiant
parameterType : ‘item’
| ‘place’
| ‘creature’
| ‘skill’
| ‘role’
| ‘ecosystem’
| ‘race’
| ‘brick’
| ‘tribe’
| ‘guild’
| ‘player’
| ‘int’
| ‘bot’
| ‘time’
| ‘money’
| ‘compass’
| ‘dyn_string_id’
| ‘string_id’
| ‘self’
| ‘creature_model’
| ‘entity’
| ‘bot_name’
| ‘bodypart’
| ‘score’
| ‘sphrase’
| ‘characteristic’
| ‘damage_type’
| ‘literal’
clauseList : clauseList clause
| clause
clause : conditionList identifiant textValue
| identifiant textValue
| conditionList identifiant
| identifiant
| textValue
conditionList : conditionList condition
| condition
condition : ‘(‘ testList ‘)’
testList : testList ‘&’ test
| test
test : operand1 operator reference
operand1 : parameterName
| parameterName’.’propertyName
propertyName : identifiant
operator : ‘=’
| ‘!=’
| ‘<’
| ‘<=’
| ‘>’
| ‘<=’
reference : identifiant
textValue : ‘[‘ .* ‘]’
As in format 1, you can include C style comment in the text and indent freely and use the include command.
Format 3: Spreadsheet unicode export
This format is the result of a Unicode text export from Spreadsheet.
Encoding should be unicode 16 bits. Columns are tab separated and rows are new line separated.
You should not write this file by hand, but only edit it with Spreadsheet.
The first row must contain the columns names.
Info columns
If a column name start with a ‘*’, then all the column is ignored.
This is useful to add information column that can help translation.
Delete character
It is possible to insert a ‘delete’ command in the field: ‘\d’. This is useful for article translation.
Example: you have a string with the following replacement (in French):
"Rapporte moi $item.da$ $item.name$"
And the item words file contains the following:
item name da
marteau marteau le
echelle échelle l’
If the item is ‘marteau’, no problem, the replacement gives:
"Rapporte moi le marteau"
But for the ‘echelle’, there is a supplementary space in the result:
"Rapporte moi l’ échelle"
To remove this supplementary space, you can add a ‘delete’ marker in the article definition:
item name da
marteau marteau le
echelle échelle l’\d
This will give a correct resulting string:
"Rapporte moi l’échelle"
Working with translation files, translator point of view
Client side “*.uxt” files
This file contains all static text available directly to the client. The text must conforms to format 1 described above.
There is an additional constraint: you MUST provide as a first entry the language name, as spelled in the language (eg ‘English’ for English, ‘Français’ for French).
For example, the file en.uxt must begin with:
languageName [English]
Server side files
Server side translation is a bit more complex.
We will learn how to write server side translation in four steps (guess what: from simple to complex problem!).
Step 1: A simple string:
For this, you only need the phrase file.
Let’s say we want a string saying “hello world!” identified by HelloWorld.
Create a phrase entry in phrase_en.txt:
HelloWorld ()
{
[Hello world!]
}
That’s it! No more.
Of course, you must also provide the same phrase in all the supported language, for example, in phrase_fr.txt:
HelloWorld ()
{
[Bonjour le monde!]
}
Note that only the text value has changed. The phrase identifier MUST remain the same in all the translations files.
Step 2: Indirection to clause_<lang>.txt
In step 4, we will see that the phrase file will become very complex. Thus, this file is not well fitted for giving it to a professional translator with no skill in complex grammar file. More, the complexity of the file can hide the work to do for translation.
So, you can split phrase grammar in phrase file and text value in clause file.
To do this, you must assign a unique identifier to each text value.
Let’s rebuild the previous example with indirection.
In phrase_en.txt, create the phrase entry like this:
HelloWorld ()
{
Hello
}
We just have put an identifier in the phrase block. This means that the phrase refers to a string identified as “Hello” in the clause file.
Now, we can create the text value in clause_en.txt:
Hello [Hello world!]
As in the first step, you must do this task for each language.
TIPS: in order to facilitate translation work, it is possible to specify the string identifier AND the string value. This can be helpful for automatic building of translation file from the original one.
Example:
HelloWorld ()
{
Hello [Bonjour le monde!]
}
In such case, the translation system always look first in the clause file and fallback to string value in the phrase file only if the string is not found in the clause file.
The other advantage is that the person who wrote the phrase file can give a simplistic version of the string that a professional translator will improve.
Step 3: Using parameters - basics
Here we are entering in the complex stuff!
Each phrase can receive a list of parameter.
Those parameters can be of different types:
item,
place,
creature,
skill,
ecosystem,
race,
brick,
tribe,
guild,
player,
int,
bot,
time,
money,
compass,
dyn_string_id,
string_id,
creature_model,
entity,
body_part,
score,
sphrase,
characteristic,
damage_type,
bot_name,
literal.
Each parameter is given a name (or identifier) when declared. We will call it paramName.
Each type of parameter CAN be associated with a ‘word’ file. This file is an excel sheet (in unicode text export form) that contain translations for the parameter: its name, undefined or defined article (e.g. ‘a’, ‘the’, etc), plural name and article and any useful property or grammar element needed for translation.
The first column is very important because it associate a row of data with a particular parameter value.
Let’s begin with an example: we want to build a dynamic phrase with a variable creature race name.
First, we must build an excel sheet to define the words for creature type. This will be saved as race_words_<lang>.txt in unicode text export from excel. As always, you must provide a version of this file for each language.
NB: The first column MUST always be the association field and you should have a ‘name’ column as it’s the default replacement for parameter. Any other column is optional and can vary from language to language to accommodate any specific grammar constraint.
This is an example race_words_en.txt:
race name ia da p pia pda
kitifly Kitifly a the Kitiflys the
varynx Varynx a the Varynx the
etc…
As stated in the note above, the first column give the race identifier as defined in the game dev sheets. The second column is the ‘highly advisable’ column for the name of the race. The ‘p’ column if the plural name. ‘ia’, ‘da’ stand for indefined article and defined article.
Next, we must create a phrase with a creature parameter in phrase_<lang>.txt:
KILL_A_CREATURE (race crea)
{}
As you can see, after the phrase identifier KILL_THIS_CREATURE we have the parameter list between the braket. We declare a parameter of type race named crea. Note that you choose freely your parameter name but each parameter must have a unique name (at least for one phrase).
Now, we can build the string value. To insert parameter into the string, we must specify replacement point by using the ‘$’ sign (eg $crea$) directly into the string value:
KILL_A_CREATURE (race crea)
{
[Would you please kill a $crea$ for me ?]
}
As you can see, it’s not too complex. $crea$ will be replaced with the content of the field from the words file in the ‘name’ column and at the row corresponding to the race identifier.
▪ It is possible to recall any of the words file columns in the value string. We can for example dynamize the undefined article:
KILL_A_CREATURE (race crea)
{
[Would you please kill $crea.ia$ $crea$ for me ?]
}
▪ Some parameter type have special replacement rules: int are replaced with their text representation, time are converted to ryzom time readable format, as well as money.
Last but not least, the identifier and indirection rules see in step 1 and 2 are still valid.
Step 4: Using parameters: conditional clause
It’s time now to unveil the conditional clause system.
Let’s say that the identifier and string value we put in a phrase in the previous step is a clause. And let’s say that a phrase can contains more than one clause that can be chosen by the translation engine on the fly depending on the parameter value. This is the conditional clause system.
Let’s start a first example. As in step 3, we want to kill creature, but this time, we add a variable number of creature to kill, from 0 to n.
What we need is conditions to select between three clause: no creature to kill, one creature to kill and more than one.
First, let’s write the phrase, its parameters and the three clauses:
KILL_A_CREATURE (race crea, int count)
{
// no creature to kill
[There is no creature to kill today.]
// 1 creature to kill
[Would you please kill a $crea$ for me ?]
// more than one
[Would you please kill $count$ $crea$ for me ?]
}