Localization system in Ryzom

Localization system in Ryzom

 

Localization system in Ryzom

Overview

There are mainly two distinct parts in localization for Ryzom. The first part (and the easiest) concerns the static localization on the client side (eg interface names, error messages). The second part is for dynamically generated text from servers.


  
 

As you can see in the diagram, there are four kind of file that makes the localization system to work. Each of this file must come in each localized language. In bold, you can see that each file contains the language code in its name.

File formats are discussed below.

 

Language code

Language in Ryzom are identified by there language code as defined in ISO 639-1 plus a country code defined in ISO 3166 if necessary.


 

ISO 639-1 is a two character language code (e.g. ‘en’, ‘fr’). This is enough for most of the language we want to support.

But there is some exception, like Chinese written language.

Chinese can be written in two forms: traditional or simplified. Nonetheless, there is only one language code for Chinese: ‘hz’.

So, we must append a country code to indicate witch form of written Chinese we discuss. The language code for simplified Chinese become ‘hz-CN’ (i.e. Chinese language, Chinese country), and for traditional Chinese, it is ‘hz’ only because all other Chinese speaking country (Taiwan, Hong Kong, ? ) use the traditional Chinese.

Identifier definition

Translated strings are associated to identifier. Identifiers are textual string that must follow the C identifier constraint with a little difference.

A C identifier must consist only of the following caracteres: ‘A-Z’, ‘a-z’, ‘0-9’, ‘@‘ and ‘_’. Real C identifier can’t start with a number, string identifier can.


 

Some good identifier:


 

This_is_a_good_identifier

ThisIsAGoodIdentifier

_This@is@notherGoodId

1234_is_a_goodId

This_Is_Good_1234


 

Some bad identifier:


 

This is a bad identifier

é#()|{[_IdBAD

File formats

There are three different translation file formats. But only two need to be learned ;-)

Format 1

This format is used for client side static text and for server side clause text.

The file is a list of identifiant to string association (also called value string). Identifiant must conform to C identifier constraint and value string is delimited by ‘[‘ and ‘]’.

Text layout is free; you can jump line and indent as you want.


 

identifiant1 [textual value]

identifiant2 [other textual value]


 

This file can contain C style comments.


 

// This is a single line comment. Continue until end of line

identifiant1 [textual value]

/* This is

a multiline

comment */

identifiant2 /* multiline comment here ! */ [other textual value]


 

Textual value can be formated for readability. New line and tab are removed in the final string value.


 

identifiant1 [textual

value

with

new line

and tab formating only for readability]

identifiant2 [other textual value]


 

If you need to specify new lines or tabulations in the value string, you must use C style escape sequence ‘\t’ for tab and ‘\n’ for new line. To write a ‘\’ in the string value, double the backslash: ‘\\’. To write a ‘]’ int the string, escape it with a backslash: ‘\]’.


 

identifiant1 [tabulation: \tThis text is tabbed]

identifiant2 [New line \nText on next line]

identifiant3 [Backslash: \\]

identifiant4 [a closing square bracket: \] ]


 

You can split the original file in multiple small file, more easy to maintain and work with.

This feature is achieved by using a C like preprocessor command “#include”.


 

#include "path/filename.txt"


 

You can have any number of include command. Included files can also contains include commands.

The path can be either an absolute path or a path relative to the location of the master file.

Format 2

This format is used for phrases translation files.

This format is a pretty complex grammar that will be described in a near LALR syntax:


 

identifiant : [A-Za-z0-9_@]+


 

phrase : identifiant ‘(‘ parameterList ‘)’

‘{‘

clauseList

‘}’


 

parameterList : parameterList ‘,’ parameterDesc

| parameterDesc


 

parameterDesc : parameterType parameterName


 

parameterName : identifiant


 

parameterType : ‘item’

| ‘place’

| ‘creature’

| ‘skill’

| ‘role’

| ‘ecosystem’

| ‘race’

| ‘brick’

| ‘tribe’

| ‘guild’

| ‘player’

| ‘int’

| ‘bot’

| ‘time’

| ‘money’

| ‘compass’

| ‘dyn_string_id’

| ‘string_id’

| ‘self’

| ‘creature_model’

| ‘entity’

| ‘bot_name’

| ‘bodypart’

| ‘score’

| ‘sphrase’

| ‘characteristic’

| ‘damage_type’

| ‘literal’


 

clauseList : clauseList clause

| clause


 

clause : conditionList identifiant textValue

| identifiant textValue

| conditionList identifiant

| identifiant

| textValue


 

conditionList : conditionList condition

| condition


 

condition : ‘(‘ testList ‘)’


 

testList : testList ‘&’ test

| test


 

test : operand1 operator reference


 

operand1 : parameterName

| parameterName’.’propertyName


 

propertyName : identifiant


 

operator : ‘=’

| ‘!=’

| ‘<’

| ‘<=’

| ‘>’

| ‘<=’


 

reference : identifiant


 

textValue : ‘[‘ .* ‘]’


 
 

As in format 1, you can include C style comment in the text and indent freely and use the include command.

Format 3: Spreadsheet unicode export


 

This format is the result of a Unicode text export from Spreadsheet.

Encoding should be unicode 16 bits. Columns are tab separated and rows are new line separated.

You should not write this file by hand, but only edit it with Spreadsheet.

The first row must contain the columns names.


 

Info columns

If a column name start with a ‘*’, then all the column is ignored.

This is useful to add information column that can help translation.


 

Delete character

It is possible to insert a ‘delete’ command in the field: ‘\d’. This is useful for article translation.

Example: you have a string with the following replacement (in French):


 

    "Rapporte moi $item.da$ $item.name$"


 

And the item words file contains the following:


 

    item name da

    marteau marteau le

    echelle échelle l’


 

If the item is ‘marteau’, no problem, the replacement gives:


 

    "Rapporte moi le marteau"


 

But for the ‘echelle’, there is a supplementary space in the result:


 

    "Rapporte moi l’ échelle"


 

To remove this supplementary space, you can add a ‘delete’ marker in the article definition:


 

    item name da

    marteau marteau le

    echelle échelle l’\d


 

This will give a correct resulting string:


 

    "Rapporte moi l’échelle"


 

Working with translation files, translator point of view

Client side “*.uxt” files

This file contains all static text available directly to the client. The text must conforms to format 1 described above.

There is an additional constraint: you MUST provide as a first entry the language name, as spelled in the language (eg ‘English’ for English, ‘Français’ for French).

For example, the file en.uxt must begin with:


 

languageName [English]

Server side files

Server side translation is a bit more complex.

We will learn how to write server side translation in four steps (guess what: from simple to complex problem!).


 

Step 1: A simple string:

For this, you only need the phrase file.

Let’s say we want a string saying “hello world!” identified by HelloWorld.

Create a phrase entry in phrase_en.txt:


 

HelloWorld ()

{

[Hello world!]

}


 

That’s it! No more.

Of course, you must also provide the same phrase in all the supported language, for example, in phrase_fr.txt:


 

HelloWorld ()

{

[Bonjour le monde!]

}


 

Note that only the text value has changed. The phrase identifier MUST remain the same in all the translations files.


 

Step 2: Indirection to clause_<lang>.txt

In step 4, we will see that the phrase file will become very complex. Thus, this file is not well fitted for giving it to a professional translator with no skill in complex grammar file. More, the complexity of the file can hide the work to do for translation.

So, you can split phrase grammar in phrase file and text value in clause file.


 

To do this, you must assign a unique identifier to each text value.

Let’s rebuild the previous example with indirection.

In phrase_en.txt, create the phrase entry like this:


 

HelloWorld ()

{

Hello

}


 

We just have put an identifier in the phrase block. This means that the phrase refers to a string identified as “Hello” in the clause file.

Now, we can create the text value in clause_en.txt:


 

Hello [Hello world!]


 

As in the first step, you must do this task for each language.


 

TIPS: in order to facilitate translation work, it is possible to specify the string identifier AND the string value. This can be helpful for automatic building of translation file from the original one.

Example:


 

HelloWorld ()

{

Hello [Bonjour le monde!]

}


 

In such case, the translation system always look first in the clause file and fallback to string value in the phrase file only if the string is not found in the clause file.

The other advantage is that the person who wrote the phrase file can give a simplistic version of the string that a professional translator will improve.


 

Step 3: Using parameters - basics

Here we are entering in the complex stuff!

Each phrase can receive a list of parameter.

Those parameters can be of different types:

  • item,

  • place,

  • creature,

  • skill,

  • ecosystem,

  • race,

  • brick,

  • tribe,

  • guild,

  • player,

  • int,

  • bot,

  • time,

  • money,

  • compass,

  • dyn_string_id,

  • string_id,

  • creature_model,

  • entity,

  • body_part,

  • score,

  • sphrase,

  • characteristic,

  • damage_type,

  • bot_name,

  • literal.


 

Each parameter is given a name (or identifier) when declared. We will call it paramName.


 

Each type of parameter CAN be associated with a ‘word’ file. This file is an excel sheet (in unicode text export form) that contain translations for the parameter: its name, undefined or defined article (e.g. ‘a’, ‘the’, etc), plural name and article and any useful property or grammar element needed for translation.

The first column is very important because it associate a row of data with a particular parameter value.


 

Let’s begin with an example: we want to build a dynamic phrase with a variable creature race name.

First, we must build an excel sheet to define the words for creature type. This will be saved as race_words_<lang>.txt in unicode text export from excel. As always, you must provide a version of this file for each language.

NB: The first column MUST always be the association field and you should have a ‘name’ column as it’s the default replacement for parameter. Any other column is optional and can vary from language to language to accommodate any specific grammar constraint.

This is an example race_words_en.txt:


 

race name ia da p pia pda

kitifly Kitifly a the Kitiflys the

varynx Varynx a the Varynx the

etc…


 

As stated in the note above, the first column give the race identifier as defined in the game dev sheets. The second column is the ‘highly advisable’ column for the name of the race. The ‘p’ column if the plural name. ‘ia’, ‘da’ stand for indefined article and defined article.


 

Next, we must create a phrase with a creature parameter in phrase_<lang>.txt:


 

KILL_A_CREATURE (race crea)

{}


 

As you can see, after the phrase identifier KILL_THIS_CREATURE we have the parameter list between the braket. We declare a parameter of type race named crea. Note that you choose freely your parameter name but each parameter must have a unique name (at least for one phrase).


 

Now, we can build the string value. To insert parameter into the string, we must specify replacement point by using the ‘$’ sign (eg $crea$) directly into the string value:


 

KILL_A_CREATURE (race crea)

{

[Would you please kill a $crea$ for me ?]

}


 

As you can see, it’s not too complex. $crea$ will be replaced with the content of the field from the words file in the ‘name’ column and at the row corresponding to the race identifier.


 

▪ It is possible to recall any of the words file columns in the value string. We can for example dynamize the undefined article:


 

KILL_A_CREATURE (race crea)

{

[Would you please kill $crea.ia$ $crea$ for me ?]

}


 

▪ Some parameter type have special replacement rules: int are replaced with their text representation, time are converted to ryzom time readable format, as well as money.


 

Last but not least, the identifier and indirection rules see in step 1 and 2 are still valid.


 

Step 4: Using parameters: conditional clause

It’s time now to unveil the conditional clause system.


 

Let’s say that the identifier and string value we put in a phrase in the previous step is a clause. And let’s say that a phrase can contains more than one clause that can be chosen by the translation engine on the fly depending on the parameter value. This is the conditional clause system.


 

Let’s start a first example. As in step 3, we want to kill creature, but this time, we add a variable number of creature to kill, from 0 to n.

What we need is conditions to select between three clause: no creature to kill, one creature to kill and more than one.


 

First, let’s write the phrase, its parameters and the three clauses:


 

KILL_A_CREATURE (race crea, int count)

{

// no creature to kill

[There is no creature to kill today.]

// 1 creature to kill

    [Would you please kill a $crea$ for me ?]

// more than one

[Would you please kill $count$ $crea$ for me ?]

}