ß Back to Jeff's Unicode info. area....
Unify Team Developer v5.1 String Handling with Unicode
* Unify Team Dev. v5.1 white paper: TDv5.1 Unicode Paper in PDF Format
* STRING-HANDLING FUNCTIONS in Team Developer:
-- The fundamental thing to remember with v5.1 is that now all
strings (variables, fields, TW columns, parameters, constants) are in Unicode, double-byte
format.
-- This means that each character in the string requires 2
(two) bytes of memory in Team Developer v5.1.
-- In TD v4.2 and prior versions, SalStrLength = SalStrGetBufferLength functions because 1 character = 1 byte.
-- In TD v5.1 a string like sSomeStr = 'ABCDEFG'
[this is 7 characters long]
returns SalStrLength( sSomeStr) = 7 characters
BUT...
returns SalStrGetBufferLength( sSomeStr) = 14 bytes [because of 2 bytes/character
in Unicode]
* TD v5.1 FILE FORMAT:
-- There is now a new file format
constant, OF_UTF16. You must include this constant in SalFileOpen
function as one of the open flage
if you will be opening or creating a Unicode text file.
-- See Wikipedia on UTF-16
for more information on this format.
* TD v5.1 FILE FORMAT:
-- Supports the BOM (Byte
Order Mark) (i.e., UTF-16) known as "Little
Endian" for text files. That is, the first 2 bytes in a Unicode text file are "FF FE",
indicating this file is in Unicode/little
endian format.
| Hex view of text files saved in TD v5.1 and TD v4.2 | ||
|
** In v4.2 the "." period character (hex 2E) is the first byte, since the file is in ANSI format; hence there is no Unicode 2-byte (FF FE) header. Also note that in v5.1 TD requires *2* bytes (a 'wide' character) for each character: "2E 00" = ".", "68 00" = "h", etc. In TD v4.2 there is only one byte per character, as you can see.
|
TEAM DEVELOPER STRING FUNCTIONS:
* SalStrGetBufferLength - from TD Help: "Returns the current buffer
length of a string" [, that is, the number of bytes in the string].
-- In Unicode, each character in a string requires 2 bytes of memory, so
the string length x 2 = buffer length.
* SalStrLength - from TD Help: "Returns a string's length" [, that is, the number of characters -- not bytes -- in the string].
* SalStrSetBufferLength - from TD Help: "Sets the buffer string length
to the parameter value and allocates memory".
-- Buffer length will equal 2x the string length.
If you need to convert Unicode <--> ANSI/single byte and back,
use the new TD v5.1 SAL functions:
* SalStrToMultiByte - from TD Help: "Converts a Unicode string
to a multibyte [ANSI/ASCII, single byte] string".|
-- Microsoft Dev. Network doc. for the WIN API function
WideCharToMultiByte
* SalStrToWideChar - from TD Help: "Converts a multibyte [ANSI/ASCII,
single byte] string to a Unicode string."
-- Microsoft Dev. Network doc. for the WIN API function
MultiByteToWideChar
* Here are some other encoding constants for use with
TD's
SalStrToMultiByte and
SalStrToWideChar functions:
Number: ENC_Latin_EastEuropean
= 1250
Number: ENC_Cryllic
= 1251
Number: ENC_Latin_WestEuropean = 1252
Number: ENC_Greek
= 1253
Number: ENC_Turkish
= 1254
Number: ENC_Hebrew
= 1255
Number: ENC_Arabic
= 1256
Number: ENC_Baltic
= 1257
Number: ENC_Vietnamese
= 1258
Number: ENC_Thai
= 874
More info. on these constant values is available here:
http://en.wikipedia.org/wiki/Code_page
for Team Developer Unicode Samples, click
here