ß Back to Jeff's Unicode info. area....
Unify Team Developer v5.2 and Later TDs String Handling with Unicode
* Original Unify Team Developer v5.1 white paper: TDv5.1 Unicode Paper in PDF Format
* UNICODE.org home page definition of: What is Unicode?
* STRING-HANDLING FUNCTIONS in Team Developer:
-- The fundamental thing to remember with v5.1 and later TDs is that now all strings (variables, fields, TW columns, parameters, constants) are in Unicode, double-byte format.
-- This means that each character in the string requires 2 (two) bytes of memory in Team Developer v5.1 and later.
-- In TD v4.2 and prior versions, returns from SalStrLength() = SalStrGetBufferLength() functions because 1 character = 1 byte.
-- In TD v5.2 and later a string like sSomeStr = 'ABCDEFG'
[this is 7 characters long]
returns SalStrLength( sSomeStr) = 7 characters
returns SalGetBufferLength( sSomeStr) = 16 bytes [because of 2 bytes/character in Unicode + 2 bytes extra for the end-of-string character]
(NOTE: SalStrGetBufferLength() & SalStrSetBufferLength() were deprecated as of TD v5.2. Now for v5.2 and later, use SalGetBufferLength() and SalSetBufferLength() )
* TD v5.2 and later FILE FORMAT:
-- There is now a new file format constant, OF_UTF16. You must include this constant in SalFileOpen function as one of the open flage if you will be opening or creating a Unicode text file.
-- See Wikipedia on UTF-16 for more information on this format.
|Hex view of text files saved in TD v5.1 (and later TDs) vs. older non-Unicode TD v4.2|
** In v4.2 the "." period character (hex 2E) is the first byte, since the file is in ANSI format; hence there is no Unicode 2-byte (FF FE) header.
Also note that in v5.1 and later TDs requires *2* bytes (a 'wide' character) for each character: "2E 00" = ".", "68 00" = "h", etc. In TD v4.2 there is only one byte per character, as you can see.
TEAM DEVELOPER STRING FUNCTIONS:
* SalGetBufferLength - from TD Help: "Returns the current buffer length of a string" [, that is, the number of bytes in the string + 2 additional byes for end-of-string char. ].
-- In Unicode, each character in a string requires 2 bytes of memory, so the string length x 2 = buffer length.
* SalStrLength - from TD Help: "Returns a string's length" [, that is, the number of characters -- not bytes -- in the string].
* SalSetBufferLength - from TD Help: "Sets the buffer string length
to the parameter value and allocates memory".
-- Buffer length will equal 2x the string length + 2 bytes for the end-of-string character.
-- To set a buffer length to a specific value, like 100, you must call SalSetBufferLength( sString, 100 + 2 ). When you pass the buffer length to an external function, pass 100.
If you need to convert Unicode <--> ANSI/single byte and back, use the new TD v5.1 and later SAL functions:
* SalStrToMultiByte - from TD Help: "Converts a Unicode string to a multibyte [ANSI/ASCII, single byte] string".|
-- Microsoft Dev. Network doc. for the WIN API function WideCharToMultiByte
* SalStrToWideChar - from TD Help: "Converts a multibyte [ANSI/ASCII, single byte] string to a Unicode string."
-- Microsoft Dev. Network doc. for the WIN API function MultiByteToWideChar
* Here are some other encoding constants for use with
Number: ENC_Latin_EastEuropean = 1250
Number: ENC_Cryllic = 1251
Number: ENC_Latin_WestEuropean = 1252
Number: ENC_Greek = 1253
Number: ENC_Turkish = 1254
Number: ENC_Hebrew = 1255
Number: ENC_Arabic = 1256
Number: ENC_Baltic = 1257
Number: ENC_Vietnamese = 1258
Number: ENC_Thai = 874
More info. on these constant values is available here on the Wikipedia Code Page
For Team Developer Unicode Samples
on my TD home page, click