Back to Jeff's Unicode info. area....

Unify Team Developer v5.2 and Later TDs String Handling with Unicode

* Original Unify Team Developer v5.1 white paper: TDv5.1 Unicode Paper in PDF Format

* UNICODE.org home page definition of: What is Unicode?

* STRING-HANDLING FUNCTIONS in Team Developer:
  -- The fundamental thing to remember with v5.1 and later TDs is that now all strings (variables, fields, TW columns, parameters, constants) are in Unicode, double-byte format.
  -- This means that each character in the string requires 2 (two) bytes of memory in Team Developer v5.1 and later.

  -- In TD v4.2 and prior versions, returns from SalStrLength() = SalStrGetBufferLength() functions because 1 character = 1 byte.

  -- In TD v5.2 and later a string like sSomeStr = 'ABCDEFG'   [this is 7 characters long]
      returns SalStrLength( sSomeStr) = 7 characters
BUT...
      returns SalGetBufferLength( sSomeStr) = 16 bytes  [because of 2 bytes/character in Unicode + 2 bytes extra for the end-of-string character]

(NOTE: SalStrGetBufferLength()  & SalStrSetBufferLength() were deprecated as of TD v5.2. Now for v5.2 and later, use SalGetBufferLength() and SalSetBufferLength() )

* TD v5.2 and later FILE FORMAT:
  
-- There is now a new file format constant, OF_UTF16. You must include this constant in SalFileOpen function as one of the open flage if you will be opening or creating a Unicode text file.
   -- See Wikipedia on UTF-16 for more information on this format.

Hex view of text files saved in TD v5.1 (and later TDs) vs. older non-Unicode TD v4.2    


The image above shows the beginning of 2 NEWAPP.APT files, one in v5.1 text format and the other in v4.2 text format.

Note:
** In v5.1 and later TDs the "." (the period character, hex 2E) is the third byte in the file, after the first 2 bytes, "FF FE", (which indicate the file is in Unicode "Little Endian" format).

** In v4.2 the "." period character (hex 2E) is the first byte, since the file is in ANSI format; hence there is no Unicode 2-byte (FF FE) header.

Also note that in v5.1 and later TDs requires *2* bytes (a 'wide' character) for each character: "2E 00" = ".", "68 00" = "h", etc. In TD v4.2 there is only one byte per character, as you can see.

 

 

TEAM DEVELOPER STRING FUNCTIONS:
* SalGetBufferLength - from TD Help: "Returns the current buffer length of a string" [, that is, the number of bytes in the string + 2 additional byes for end-of-string char. ].
   -- In Unicode, each character in a string requires 2 bytes of memory, so the string length x 2 = buffer length.

* SalStrLength - from TD Help: "Returns a string's length" [, that is, the number of characters -- not bytes -- in the string].

* SalSetBufferLength - from TD Help: "Sets the buffer string length to the parameter value and allocates memory".
   -- Buffer length will equal 2x the string length + 2 bytes for the end-of-string character.
   -- To set a buffer length to a specific value, like 100, you must call SalSetBufferLength( sString, 100 + 2 ). When you pass the buffer length to an external function, pass 100.


If you need to convert Unicode <--> ANSI/single byte and back, use the new TD v5.1 and later SAL functions:
* SalStrToMultiByte - from TD Help: "Converts a Unicode string to a multibyte [ANSI/ASCII, single byte] string".|
   -- Microsoft Dev. Network doc. for the WIN API function WideCharToMultiByte

* SalStrToWideChar - from TD Help: "Converts a multibyte [ANSI/ASCII, single byte] string to a Unicode string."
   -- Microsoft Dev. Network doc. for the WIN API function MultiByteToWideChar

* Here are some other encoding constants for use with TD's SalStrToMultiByte and SalStrToWideChar functions:
Number: ENC_Latin_EastEuropean = 1250
Number: ENC_Cryllic            = 1251
Number: ENC_Latin_WestEuropean = 1252
Number: ENC_Greek              = 1253
Number: ENC_Turkish            = 1254
Number: ENC_Hebrew             = 1255
Number: ENC_Arabic             = 1256
Number: ENC_Baltic             = 1257
Number: ENC_Vietnamese         = 1258
Number: ENC_Thai               = 874
More info. on these constant values is available here on the Wikipedia Code Page
 

For Team Developer Unicode Samples on my TD home page, click here
 

   top

Google