/
Localization system in Ryzom

Localization system in Ryzom

 

Localization system in Ryzom

Overview

There are mainly two distinct parts in localization for Ryzom. The first part (and the easiest) concerns the static localization on the client side (eg interface names, error messages). The second part is for dynamically generated text from servers.


  
 

As you can see in the diagram, there are four kind of file that makes the localization system to work. Each of this file must come in each localized language. In bold, you can see that each file contains the language code in its name.

File formats are discussed below.

 

Language code

Language in Ryzom are identified by there language code as defined in ISO 639-1 plus a country code defined in ISO 3166 if necessary.


 

ISO 639-1 is a two character language code (e.g. ‘en’, ‘fr’). This is enough for most of the language we want to support.

But there is some exception, like Chinese written language.

Chinese can be written in two forms: traditional or simplified. Nonetheless, there is only one language code for Chinese: ‘hz’.

So, we must append a country code to indicate witch form of written Chinese we discuss. The language code for simplified Chinese become ‘hz-CN’ (i.e. Chinese language, Chinese country), and for traditional Chinese, it is ‘hz’ only because all other Chinese speaking country (Taiwan, Hong Kong, ? ) use the traditional Chinese.

Identifier definition

Translated strings are associated to identifier. Identifiers are textual string that must follow the C identifier constraint with a little difference.

A C identifier must consist only of the following caracteres: ‘A-Z’, ‘a-z’, ‘0-9’, ‘@‘ and ‘_’. Real C identifier can’t start with a number, string identifier can.


 

Some good identifier:


 

This_is_a_good_identifier

ThisIsAGoodIdentifier

_This@is@notherGoodId

1234_is_a_goodId

This_Is_Good_1234


 

Some bad identifier:


 

This is a bad identifier

é#()|{[_IdBAD

File formats

There are three different translation file formats. But only two need to be learned ;-)

Format 1

This format is used for client side static text and for server side clause text.

The file is a list of identifiant to string association (also called value string). Identifiant must conform to C identifier constraint and value string is delimited by ‘[‘ and ‘]’.

Text layout is free; you can jump line and indent as you want.


 

identifiant1 [textual value]

identifiant2 [other textual value]


 

This file can contain C style comments.


 

// This is a single line comment. Continue until end of line

identifiant1 [textual value]

/* This is

a multiline

comment */

identifiant2 /* multiline comment here ! */ [other textual value]


 

Textual value can be formated for readability. New line and tab are removed in the final string value.


 

identifiant1 [textual

value

with

new line

and tab formating only for readability]

identifiant2 [other textual value]


 

If you need to specify new lines or tabulations in the value string, you must use C style escape sequence ‘\t’ for tab and ‘\n’ for new line. To write a ‘\’ in the string value, double the backslash: ‘\\’. To write a ‘]’ int the string, escape it with a backslash: ‘\]’.


 

identifiant1 [tabulation: \tThis text is tabbed]

identifiant2 [New line \nText on next line]

identifiant3 [Backslash: \\]

identifiant4 [a closing square bracket: \] ]


 

You can split the original file in multiple small file, more easy to maintain and work with.

This feature is achieved by using a C like preprocessor command “#include”.


 

#include "path/filename.txt"


 

You can have any number of include command. Included files can also contains include commands.

The path can be either an absolute path or a path relative to the location of the master file.

Format 2

This format is used for phrases translation files.

This format is a pretty complex grammar that will be described in a near LALR syntax:


 

identifiant : [A-Za-z0-9_@]+


 

phrase : identifiant ‘(‘ parameterList ‘)’

‘{‘

clauseList

‘}’


 

parameterList : parameterList ‘,’ parameterDesc

| parameterDesc


 

parameterDesc : parameterType parameterName


 

parameterName : identifiant


 

parameterType : ‘item’

| ‘place’

| ‘creature’

| ‘skill’

| ‘role’

| ‘ecosystem’

| ‘race’

| ‘brick’

| ‘tribe’

| ‘guild’

| ‘player’

| ‘int’

| ‘bot’

| ‘time’

| ‘money’

| ‘compass’

| ‘dyn_string_id’

| ‘string_id’

| ‘self’

| ‘creature_model’

| ‘entity’

| ‘bot_name’

| ‘bodypart’

| ‘score’

| ‘sphrase’

| ‘characteristic’

| ‘damage_type’

| ‘literal’


 

clauseList : clauseList clause

| clause


 

clause : conditionList identifiant textValue

| identifiant textValue

| conditionList identifiant

| identifiant

| textValue


 

conditionList : conditionList condition

| condition


 

condition : ‘(‘ testList ‘)’


 

testList : testList ‘&’ test

| test


 

test : operand1 operator reference


 

operand1 : parameterName

| parameterName’.’propertyName


 

propertyName : identifiant


 

operator : ‘=’

| ‘!=’

| ‘<’

| ‘<=’

| ‘>’

| ‘<=’


 

reference : identifiant


 

textValue : ‘[‘ .* ‘]’


 
 

As in format 1, you can include C style comment in the text and indent freely and use the include command.

Format 3: Spreadsheet unicode export


 

This format is the result of a Unicode text export from Spreadsheet.

Encoding should be unicode 16 bits. Columns are tab separated and rows are new line separated.

You should not write this file by hand, but only edit it with Spreadsheet.

The first row must contain the columns names.


 

Info columns

If a column name start with a ‘*’, then all the column is ignored.

This is useful to add information column that can help translation.


 

Delete character

It is possible to insert a ‘delete’ command in the field: ‘\d’. This is useful for article translation.

Example: you have a string with the following replacement (in French):


 

    "Rapporte moi $item.da$ $item.name$"


 

And the item words file contains the following:


 

    item name da

    marteau marteau le

    echelle échelle l’


 

If the item is ‘marteau’, no problem, the replacement gives:


 

    "Rapporte moi le marteau"


 

But for the ‘echelle’, there is a supplementary space in the result:


 

    "Rapporte moi l’ échelle"


 

To remove this supplementary space, you can add a ‘delete’ marker in the article definition:


 

    item name da

    marteau marteau le

    echelle échelle l’\d


 

This will give a correct resulting string:


 

    "Rapporte moi l’échelle"


 

Working with translation files, translator point of view

Client side “*.uxt” files

This file contains all static text available directly to the client. The text must conforms to format 1 described above.

There is an additional constraint: you MUST provide as a first entry the language name, as spelled in the language (eg ‘English’ for English, ‘Français’ for French).

For example, the file en.uxt must begin with:


 

languageName [English]

Server side files

Server side translation is a bit more complex.

We will learn how to write server side translation in four steps (guess what: from simple to complex problem!).


 

Step 1: A simple string:

For this, you only need the phrase file.

Let’s say we want a string saying “hello world!” identified by HelloWorld.

Create a phrase entry in phrase_en.txt:


 

HelloWorld ()

{

[Hello world!]

}


 

That’s it! No more.

Of course, you must also provide the same phrase in all the supported language, for example, in phrase_fr.txt:


 

HelloWorld ()

{

[Bonjour le monde!]

}


 

Note that only the text value has changed. The phrase identifier MUST remain the same in all the translations files.