8 Building Global Applications

This chapter discusses global application development in a PHP and Oracle Database environment. It addresses the basic tasks associated with developing and deploying global Internet applications, including developing locale awareness, constructing HTML content in the user-preferred language, and presenting data following the cultural conventions of the locale of the user.

Building a global Internet application that supports different locales requires good development practices. A locale refers to a national language and the region in which the language is spoken. The application itself must be aware of the locale preference of the user and be able to present content following the cultural conventions expected by the user. It is important to present data with appropriate locale characteristics, such as the correct date and number formats. Oracle Database is fully internationalized to provide a global platform for developing and deploying global applications.

This chapter has the following topics:

Establishing the Environment Between Oracle and PHP

Correctly setting up the connectivity between the PHP engine and the Oracle database is first step in building a global application, it guarantees data integrity across all tiers. Most internet based standards support Unicode as a character encoding, in this chapter we will focus on using Unicode as the character set for data exchange.

PHP uses the OCI8 extension, and rules that apply to OCI also apply to PHP. Oracle locale behavior (including the client character set used in OCI applications) is defined by the NLS_LANG environment variable. This environment variable has the form:

     <language>_<territory>.<character set>

For example, for a German user in Germany running an application in Unicode, NLS_LANG should be set to

GERMAN_GERMANY.AL32UTF8

The language and territory settings control Oracle behaviors such as the Oracle date format, error message language, and the rules used for sort order. The character set AL32UTF8 is the Oracle name for UTF-8.

For information on the NLS_LANG environment variable, see the Oracle Database installation guides.

When PHP is installed on Apache, you can set NLS_LANG in /etc/profile:

    export NLS_LANG GERMAN_GERMANY.AL32UTF8

If PHP is installed on Oracle HTTP Server, you must set NLS_LANG as an environment variable in $ORACLE_HOME/opmn/conf/opmn.xml:

   <ias-component id="HTTP_Server">
     <process-type id="HTTP_Server" module-id="OHS">
       <environment>
         <variable id="PERL5LIB"
          value="D:\oracle\1012J2EE\Apache\Apache\mod_perl\site\5.6.1\lib"/>
         <variable id="PHPRC" value="D:\oracle\1012J2EE\Apache\Apache\conf"/>
         <variable id="NLS_LANG" value="german_germany.al32utf8"/>
       </environment>
       <module-data>
         <category id="start-parameters">
           <data id="start-mode" value="ssl-disabled"/>
         </category>
       </module-data>
       <process-set id="HTTP_Server" numprocs="1"/>
     </process-type>
   </ias-component>

You must restart the Web listener to implement the change.

Manipulating Strings

PHP was designed to work with the ISO-8859-1 character set. To handle other character sets, specifically multibyte character sets, a set of "MultiByte String Functions" is available. To enable these functions, you must enable the mbstring extension.

Your application code should use functions such as mb_strlen() to calculate the number of characters in strings. This may return different values than strlen(), which returns the number of bytes in a string.

Once you have enabled the mbstring extension and restarted the Web server, several configuration options become available. You can change the behavior of the standard PHP string functions by setting mbstring.func_overload to one of the "Overload" settings.

For more information, see the PHP mbstring reference manual at

http://www.php.net/mbstring

Determining the Locale of the User

In a global environment, your application should accommodate users with different locale preferences. Once it has determined the preferred locale of the user, the application should construct HTML content in the language of the locale and follow the cultural conventions implied by the locale.

A common method to determine the locale of a user is from the default ISO locale setting of the browser. Usually a browser sends its locale preference setting to the HTTP server with the Accept Language HTTP header. If the Accept Language header is NULL, then there is no locale preference information available, and the application should fall back to a predefined default locale.

The following PHP code retrieves the ISO locale from the Accept-Language HTTP header through the $_SERVER Server variable.

$s = $_SERVER["HTTP_ACCEPT_LANGUAGE"]

Developing Locale Awareness

Once the locale preference of the user has been determined, the application can call locale-sensitive functions, such as date, time, and monetary formatting to format the HTML pages according to the cultural conventions of the locale.

When you write global applications implemented in different programming environments, you should enable the synchronization of user locale settings between the different environments. For example, PHP applications that call PL/SQL procedures should map the ISO locales to the corresponding NLS_LANGUAGE and NLS_TERRITORY values and change the parameter values to match the locale of the user before calling the PL/SQL procedures. The PL/SQL UTL_I18N package contains mapping functions that can map between ISO and Oracle locales.

Table 8-1 shows how some commonly used locales are defined in ISO and Oracle environments.

Table 8-1 Locale Representations in ISO, SQL, and PL/SQL Programming Environments

Locale Locale ID NLS_LANGUAGE NLS_TERRITORY

Chinese (P.R.C.)

zh-CN

SIMPLIFIED CHINESE

CHINA

Chinese (Taiwan)

zh-TW

TRADITIONAL CHINESE

TAIWAN

English (U.S.A)

en-US

AMERICAN

AMERICA

English (United Kingdom)

en-GB

ENGLISH

UNITED KINGDOM

French (Canada)

fr-CA

CANADIAN FRENCH

CANADA

French (France)

fr-FR

FRENCH

FRANCE

German

de

GERMAN

GERMANY

Italian

it

ITALIAN

ITALY

Japanese

ja

JAPANESE

JAPAN

Korean

ko

KOREAN

KOREA

Portuguese (Brazil)

pt-BR

BRAZILIAN PORTUGUESE

BRAZIL

Portuguese

pt

PORTUGUESE

PORTUGAL

Spanish

es

SPANISH

SPAIN


Encoding HTML Pages

The encoding of an HTML page is important information for a browser and an Internet application. You can think of the page encoding as the character set used for the locale that an Internet application is serving. The browser must know about the page encoding so that it can use the correct fonts and character set mapping tables to display the HTML pages. Internet applications must know about the HTML page encoding so they can process input data from an HTML form.

Instead of using different native encodings for the different locales, Oracle recommends that you use UTF-8 (Unicode encoding) for all page encodings. This encoding not only simplifies the coding for global applications, but it also enables multilingual content on a single page.

Specifying the Page Encoding for HTML Pages

You can specify the encoding of an HTML page either in the HTTP header, or in HTML page header.

Specifying the Encoding in the HTTP Header

To specify HTML page encoding in the HTTP header, include the Content-Type HTTP header in the HTTP specification. It specifies the content type and character set. The Content-Type HTTP header has the following form:

Content-Type: text/html; charset=utf-8

The charset parameter specifies the encoding for the HTML page. The possible values for the charset parameter are the IANA names for the character encodings that the browser supports.

Specifying the Encoding in the HTML Page Header

Use this method primarily for static HTML pages. To specify HTML page encoding in the HTML page header, specify the character encoding in the HTML header as follows:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

The charset parameter specifies the encoding for the HTML page. As with the Content-Type HTTP Header, the possible values for the charset parameter are the IANA names for the character encodings that the browser supports.

Specifying the Page Encoding in PHP

You can specify the encoding of an HTML page in the Content-Type HTTP header by setting the PHP configuration variable as follows:

default_charset = UTF-8

This setting does not imply any conversion of outgoing pages. Your application must ensure that the server-generated pages are encoded in UTF-8.

Organizing the Content of HTML Pages for Translation

Making the user interface available in the local language of the user is a fundamental task in globalizing an application. Translatable sources for the content of an HTML page belong to the following categories:

  • Text strings included in the application code

  • Static HTML files, images files, and template files such as CSS

  • Dynamic data stored in the database

Strings in PHP

You should externalize translatable strings within your PHP application logic, so that the text is readily available for translation. These text messages can be stored in flat files or database tables depending on the type and the volume of the data being translated.

Static Files

Static files such as HTML and GIF files are readily translatable. When these files are translated, they should be translated into the corresponding language with UTF-8 as the file encoding. To differentiate the languages of the translated files, stage the static files of different languages in different directories or with different file names.

Data from the Database

Dynamic information such as product names and product descriptions is typically stored in the database. To differentiate various translations, the database schema holding this information should include a column to indicate the language. To select the desired language, you must include a WHERE clause in your query.

Presenting Data Using Conventions Expected by the User

Data in the application must be presented in a way that conforms to the expectation of the user. Otherwise, the meaning of the data can be misinterpreted. For example, the date '12/11/05' implies '11th December 2005' in the United States, whereas in the United Kingdom it means '12th November 2005'. Similar confusion exists for number and monetary formats of the users. For example, the symbol '.' is a decimal separator in the United States; in Germany this symbol is a thousand separator.

Different languages have their own sorting rules. Some languages are collated according to the letter sequence in the alphabet, some according to the number of stroke counts in the letter, and some languages are ordered by the pronunciation of the words. Presenting data not sorted in the linguistic sequence that your users are accustomed to can make searching for information difficult and time consuming.

Depending on the application logic and the volume of data retrieved from the database, it may be more appropriate to format the data at the database level rather than at the application level. Oracle Database offers many features that help to refine the presentation of data when the locale preference of the user is known. The following sections provide examples of locale-sensitive operations in SQL.

Oracle Date Formats

The three different date presentation formats in Oracle Database are standard, short, and long dates. The following examples illustrate the differences between the short date and long date formats for both the United States and Germany.

SQL> alter session set nls_territory=america nls_language=american;

Session altered.


SQL> select employee_id EmpID,
  2  substr(first_name,1,1)||'.'||last_name "EmpName",
  3  to_char(hire_date,'DS') "Hiredate",
  4  to_char(hire_date,'DL') "Long HireDate"
  5  from employees
  6* where employee_id <105;

     EMPID EmpName                     Hiredate   Long HireDate
---------- --------------------------- ---------- -----------------------------
       100 S.King                      06/17/1987 Wednesday, June 17, 1987
       101 N.Kochhar                   09/21/1989 Thursday, September 21, 1989
       102 L.De Haan                   01/13/1993 Wednesday, January 13, 1993
       103 A.Hunold                    01/03/1990 Wednesday, January 3, 1990
       104 B.Ernst                     05/21/1991 Tuesday, May 21, 1991
SQL> alter session set nls_territory=germany nls_language=german;

Session altered.

SQL> select employee_id EmpID,
  2  substr(first_name,1,1)||'.'||last_name "EmpName",
  3  to_char(hire_date,'DS') "Hiredate",
  4  to_char(hire_date,'DL') "Long HireDate"
  5  from employees
  6* where employee_id <105;

     EMPID EmpName                     Hiredate Long HireDate
---------- --------------------------- -------- ------------------------------
       100 S.King                      17.06.87 Mittwoch, 17. Juni 1987
       101 N.Kochhar                   21.09.89 Donnerstag, 21. September 1989
       102 L.De Haan                   13.01.93 Mittwoch, 13. Januar 1993
       103 A.Hunold                    03.01.90 Mittwoch, 3. Januar 1990
       104 B.Ernst                     21.05.91 Dienstag, 21. Mai 1991

Oracle Number Formats

The following examples illustrate the differences in the decimal character and group separator between the United States and Germany.

SQL> alter session set nls_territory=america;

Session altered.

SQL> select employee_id EmpID,
  2  substr(first_name,1,1)||'.'||last_name "EmpName",
  3  to_char(salary, '99G999D99') "Salary"
  4  from employees
  5* where employee_id <105

     EMPID EmpName                     Salary
---------- --------------------------- ----------
       100 S.King                       24,000.00
       101 N.Kochhar                    17,000.00
       102 L.De Haan                    17,000.00
       103 A.Hunold                      9,000.00
       104 B.Ernst                       6,000.00

SQL> alter session set nls_territory=germany;

Session altered.

SQL> select employee_id EmpID,
  2  substr(first_name,1,1)||'.'||last_name "EmpName",
  3  to_char(salary, '99G999D99') "Salary"
  4  from employees
  5* where employee_id <105

     EMPID EmpName                     Salary
---------- --------------------------- ----------
       100 S.King                       24.000,00
       101 N.Kochhar                    17.000,00
       102 L.De Haan                    17.000,00
       103 A.Hunold                      9.000,00
       104 B.Ernst                       6.000,00

Oracle Linguistic Sorts

Spain traditionally treats ch, ll as well as ñ as unique letters, ordered after c, l and n respectively. The following examples illustrate the effect of using a Spanish sort against the employee names Chen and Chung.

SQL> alter session set nls_sort=binary;

Session altered.

SQL> select employee_id EmpID,
  2         last_name "Last Name"
  3  from employees
  4  where last_name like 'C%'
  5* order by last_name

     EMPID Last Name
---------- -------------------------
       187 Cabrio
       148 Cambrault
       154 Cambrault
       110 Chen
       188 Chung
       119 Colmenares

6 rows selected.

SQL> alter session set nls_sort=spanish_m;

Session altered.

SQL> select employee_id EmpID,
  2         last_name "Last Name"
  3  from employees
  4  where last_name like 'C%'
  5* order by last_name

     EMPID Last Name
---------- -------------------------
       187 Cabrio
       148 Cambrault
       154 Cambrault
       119 Colmenares
       110 Chen
       188 Chung

6 rows selected.

Oracle Error Messages

The NLS_LANGUAGE parameter also controls the language of the database error messages being returned from the database. Setting this parameter prior to submitting your SQL statement ensures that the language-specific database error messages will be returned to the application.

Consider the following server message:

ORA-00942: table or view does not exist

When the NLS_LANGUAGE parameter is set to French, the server message appears as follows:

ORA-00942: table ou vue inexistante

For more discussion of globalization support features in Oracle Database, see "Working in a Global Environment" in Oracle Database 2 Day Developer's Guide.