Skip to content

FreeType » Docs » Core API » Character Mapping


Character Mapping

Synopsis

This section holds functions and structures that are related to mapping character input codes to glyph indices.

Note that for many scripts the simplistic approach used by FreeType of mapping a single character to a single glyph is not valid or possible! In general, a higher-level library like HarfBuzz or ICU should be used for handling text strings.

FT_CharMap

Defined in FT_FREETYPE_H (freetype/freetype.h).

  typedef struct FT_CharMapRec_*  FT_CharMap;

A handle to a character map (usually abbreviated to ‘charmap’). A charmap is used to translate character codes in a given encoding into glyph indexes for its parent's face. Some font formats may provide several charmaps per font.

Each face object owns zero or more charmaps, but only one of them can be ‘active’, providing the data used by FT_Get_Char_Index or FT_Load_Char.

The list of available charmaps in a face is available through the face->num_charmaps and face->charmaps fields of FT_FaceRec.

The currently active charmap is available as face->charmap. You should call FT_Set_Charmap to change it.

note

When a new face is created (either through FT_New_Face or FT_Open_Face), the library looks for a Unicode charmap within the list and automatically activates it. If there is no Unicode charmap, FreeType doesn't set an ‘active’ charmap.

also

See FT_CharMapRec for the publicly accessible fields of a given character map.


FT_CharMapRec

Defined in FT_FREETYPE_H (freetype/freetype.h).

  typedef struct  FT_CharMapRec_
  {
    FT_Face      face;
    FT_Encoding  encoding;
    FT_UShort    platform_id;
    FT_UShort    encoding_id;

  } FT_CharMapRec;

The base charmap structure.

fields

face

A handle to the parent face object.

encoding

An FT_Encoding tag identifying the charmap. Use this with FT_Select_Charmap.

platform_id

An ID number describing the platform for the following encoding ID. This comes directly from the TrueType specification and gets emulated for other formats.

encoding_id

A platform-specific encoding number. This also comes from the TrueType specification and gets emulated similarly.


FT_Encoding

Defined in FT_FREETYPE_H (freetype/freetype.h).

  typedef enum  FT_Encoding_
  {
    FT_ENC_TAG( FT_ENCODING_NONE, 0, 0, 0, 0 ),

    FT_ENC_TAG( FT_ENCODING_MS_SYMBOL, 's', 'y', 'm', 'b' ),
    FT_ENC_TAG( FT_ENCODING_UNICODE,   'u', 'n', 'i', 'c' ),

    FT_ENC_TAG( FT_ENCODING_SJIS,    's', 'j', 'i', 's' ),
    FT_ENC_TAG( FT_ENCODING_PRC,     'g', 'b', ' ', ' ' ),
    FT_ENC_TAG( FT_ENCODING_BIG5,    'b', 'i', 'g', '5' ),
    FT_ENC_TAG( FT_ENCODING_WANSUNG, 'w', 'a', 'n', 's' ),
    FT_ENC_TAG( FT_ENCODING_JOHAB,   'j', 'o', 'h', 'a' ),

    /* for backward compatibility */
    FT_ENCODING_GB2312     = FT_ENCODING_PRC,
    FT_ENCODING_MS_SJIS    = FT_ENCODING_SJIS,
    FT_ENCODING_MS_GB2312  = FT_ENCODING_PRC,
    FT_ENCODING_MS_BIG5    = FT_ENCODING_BIG5,
    FT_ENCODING_MS_WANSUNG = FT_ENCODING_WANSUNG,
    FT_ENCODING_MS_JOHAB   = FT_ENCODING_JOHAB,

    FT_ENC_TAG( FT_ENCODING_ADOBE_STANDARD, 'A', 'D', 'O', 'B' ),
    FT_ENC_TAG( FT_ENCODING_ADOBE_EXPERT,   'A', 'D', 'B', 'E' ),
    FT_ENC_TAG( FT_ENCODING_ADOBE_CUSTOM,   'A', 'D', 'B', 'C' ),
    FT_ENC_TAG( FT_ENCODING_ADOBE_LATIN_1,  'l', 'a', 't', '1' ),

    FT_ENC_TAG( FT_ENCODING_OLD_LATIN_2, 'l', 'a', 't', '2' ),

    FT_ENC_TAG( FT_ENCODING_APPLE_ROMAN, 'a', 'r', 'm', 'n' )

  } FT_Encoding;


  /* these constants are deprecated; use the corresponding `FT_Encoding` */
  /* values instead                                                      */
#define ft_encoding_none            FT_ENCODING_NONE
#define ft_encoding_unicode         FT_ENCODING_UNICODE
#define ft_encoding_symbol          FT_ENCODING_MS_SYMBOL
#define ft_encoding_latin_1         FT_ENCODING_ADOBE_LATIN_1
#define ft_encoding_latin_2         FT_ENCODING_OLD_LATIN_2
#define ft_encoding_sjis            FT_ENCODING_SJIS
#define ft_encoding_gb2312          FT_ENCODING_PRC
#define ft_encoding_big5            FT_ENCODING_BIG5
#define ft_encoding_wansung         FT_ENCODING_WANSUNG
#define ft_encoding_johab           FT_ENCODING_JOHAB

#define ft_encoding_adobe_standard  FT_ENCODING_ADOBE_STANDARD
#define ft_encoding_adobe_expert    FT_ENCODING_ADOBE_EXPERT
#define ft_encoding_adobe_custom    FT_ENCODING_ADOBE_CUSTOM
#define ft_encoding_apple_roman     FT_ENCODING_APPLE_ROMAN

An enumeration to specify character sets supported by charmaps. Used in the FT_Select_Charmap API function.

note

Despite the name, this enumeration lists specific character repertoires (i.e., charsets), and not text encoding methods (e.g., UTF-8, UTF-16, etc.).

Other encodings might be defined in the future.

values

FT_ENCODING_NONE

The encoding value 0 is reserved for all formats except BDF, PCF, and Windows FNT; see below for more information.

FT_ENCODING_UNICODE

The Unicode character set. This value covers all versions of the Unicode repertoire, including ASCII and Latin-1. Most fonts include a Unicode charmap, but not all of them.

For example, if you want to access Unicode value U+1F028 (and the font contains it), use value 0x1F028 as the input value for FT_Get_Char_Index.

FT_ENCODING_MS_SYMBOL

Microsoft Symbol encoding, used to encode mathematical symbols and wingdings. For more information, see ‘https://www.microsoft.com/typography/otspec/recom.htm#non-standard-symbol-fonts’, ‘http://www.kostis.net/charsets/symbol.htm’, and ‘http://www.kostis.net/charsets/wingding.htm’.

This encoding uses character codes from the PUA (Private Unicode Area) in the range U+F020-U+F0FF.

FT_ENCODING_SJIS

Shift JIS encoding for Japanese. More info at ‘https://en.wikipedia.org/wiki/Shift_JIS’. See note on multi-byte encodings below.

FT_ENCODING_PRC

Corresponds to encoding systems mainly for Simplified Chinese as used in People's Republic of China (PRC). The encoding layout is based on GB 2312 and its supersets GBK and GB 18030.

FT_ENCODING_BIG5

Corresponds to an encoding system for Traditional Chinese as used in Taiwan and Hong Kong.

FT_ENCODING_WANSUNG

Corresponds to the Korean encoding system known as Extended Wansung (MS Windows code page 949). For more information see ‘https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit949.txt’.

FT_ENCODING_JOHAB

The Korean standard character set (KS C 5601-1992), which corresponds to MS Windows code page 1361. This character set includes all possible Hangul character combinations.

FT_ENCODING_ADOBE_LATIN_1

Corresponds to a Latin-1 encoding as defined in a Type 1 PostScript font. It is limited to 256 character codes.

FT_ENCODING_ADOBE_STANDARD

Adobe Standard encoding, as found in Type 1, CFF, and OpenType/CFF fonts. It is limited to 256 character codes.

FT_ENCODING_ADOBE_EXPERT

Adobe Expert encoding, as found in Type 1, CFF, and OpenType/CFF fonts. It is limited to 256 character codes.

FT_ENCODING_ADOBE_CUSTOM

Corresponds to a custom encoding, as found in Type 1, CFF, and OpenType/CFF fonts. It is limited to 256 character codes.

FT_ENCODING_APPLE_ROMAN

Apple roman encoding. Many TrueType and OpenType fonts contain a charmap for this 8-bit encoding, since older versions of Mac OS are able to use it.

FT_ENCODING_OLD_LATIN_2

This value is deprecated and was neither used nor reported by FreeType. Don't use or test for it.

FT_ENCODING_MS_SJIS

Same as FT_ENCODING_SJIS. Deprecated.

FT_ENCODING_MS_GB2312

Same as FT_ENCODING_PRC. Deprecated.

FT_ENCODING_MS_BIG5

Same as FT_ENCODING_BIG5. Deprecated.

FT_ENCODING_MS_WANSUNG

Same as FT_ENCODING_WANSUNG. Deprecated.

FT_ENCODING_MS_JOHAB

Same as FT_ENCODING_JOHAB. Deprecated.

note

When loading a font, FreeType makes a Unicode charmap active if possible (either if the font provides such a charmap, or if FreeType can synthesize one from PostScript glyph name dictionaries; in either case, the charmap is tagged with FT_ENCODING_UNICODE). If such a charmap is synthesized, it is placed at the first position of the charmap array.

All other encodings are considered legacy and tagged only if explicitly defined in the font file. Otherwise, FT_ENCODING_NONE is used.

FT_ENCODING_NONE is set by the BDF and PCF drivers if the charmap is neither Unicode nor ISO-8859-1 (otherwise it is set to FT_ENCODING_UNICODE). Use FT_Get_BDF_Charset_ID to find out which encoding is really present. If, for example, the cs_registry field is ‘KOI8’ and the cs_encoding field is ‘R’, the font is encoded in KOI8-R.

FT_ENCODING_NONE is always set (with a single exception) by the winfonts driver. Use FT_Get_WinFNT_Header and examine the charset field of the FT_WinFNT_HeaderRec structure to find out which encoding is really present. For example, FT_WinFNT_ID_CP1251 (204) means Windows code page 1251 (for Russian).

FT_ENCODING_NONE is set if platform_id is TT_PLATFORM_MACINTOSH and encoding_id is not TT_MAC_ID_ROMAN (otherwise it is set to FT_ENCODING_APPLE_ROMAN).

If platform_id is TT_PLATFORM_MACINTOSH, use the function FT_Get_CMap_Language_ID to query the Mac language ID that may be needed to be able to distinguish Apple encoding variants. See

https://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/Readme.txt

to get an idea how to do that. Basically, if the language ID is 0, don't use it, otherwise subtract 1 from the language ID. Then examine encoding_id. If, for example, encoding_id is TT_MAC_ID_ROMAN and the language ID (minus 1) is TT_MAC_LANGID_GREEK, it is the Greek encoding, not Roman. TT_MAC_ID_ARABIC with TT_MAC_LANGID_FARSI means the Farsi variant of the Arabic encoding.


FT_ENC_TAG

Defined in FT_FREETYPE_H (freetype/freetype.h).

#ifndef FT_ENC_TAG

#define FT_ENC_TAG( value, a, b, c, d )                             \
          value = ( ( FT_STATIC_BYTE_CAST( FT_UInt32, a ) << 24 ) | \
                    ( FT_STATIC_BYTE_CAST( FT_UInt32, b ) << 16 ) | \
                    ( FT_STATIC_BYTE_CAST( FT_UInt32, c ) <<  8 ) | \
                      FT_STATIC_BYTE_CAST( FT_UInt32, d )         )

#endif /* FT_ENC_TAG */

This macro converts four-letter tags into an unsigned long. It is used to define ‘encoding’ identifiers (see FT_Encoding).

note

Since many 16-bit compilers don't like 32-bit enumerations, you should redefine this macro in case of problems to something like this:

  #define FT_ENC_TAG( value, a, b, c, d )  value

to get a simple enumeration without assigning special numbers.


FT_Select_Charmap

Defined in FT_FREETYPE_H (freetype/freetype.h).

  FT_EXPORT( FT_Error )
  FT_Select_Charmap( FT_Face      face,
                     FT_Encoding  encoding );

Select a given charmap by its encoding tag (as listed in freetype.h).

inout

face

A handle to the source face object.

input

encoding

A handle to the selected encoding.

return

FreeType error code. 0 means success.

note

This function returns an error if no charmap in the face corresponds to the encoding queried here.

Because many fonts contain more than a single cmap for Unicode encoding, this function has some special code to select the one that covers Unicode best (‘best’ in the sense that a UCS-4 cmap is preferred to a UCS-2 cmap). It is thus preferable to FT_Set_Charmap in this case.


FT_Set_Charmap

Defined in FT_FREETYPE_H (freetype/freetype.h).

  FT_EXPORT( FT_Error )
  FT_Set_Charmap( FT_Face     face,
                  FT_CharMap  charmap );

Select a given charmap for character code to glyph index mapping.

inout

face

A handle to the source face object.

input

charmap

A handle to the selected charmap.

return

FreeType error code. 0 means success.

note

This function returns an error if the charmap is not part of the face (i.e., if it is not listed in the face->charmaps table).

It also fails if an OpenType type 14 charmap is selected (which doesn't map character codes to glyph indices at all).


FT_Get_Charmap_Index

Defined in FT_FREETYPE_H (freetype/freetype.h).

  FT_EXPORT( FT_Int )
  FT_Get_Charmap_Index( FT_CharMap  charmap );

Retrieve index of a given charmap.

input

charmap

A handle to a charmap.

return

The index into the array of character maps within the face to which charmap belongs. If an error occurs, -1 is returned.


FT_Get_Char_Index

Defined in FT_FREETYPE_H (freetype/freetype.h).

  FT_EXPORT( FT_UInt )
  FT_Get_Char_Index( FT_Face   face,
                     FT_ULong  charcode );

Return the glyph index of a given character code. This function uses the currently selected charmap to do the mapping.

input

face

A handle to the source face object.

charcode

The character code.

return

The glyph index. 0 means ‘undefined character code’.

note

If you use FreeType to manipulate the contents of font files directly, be aware that the glyph index returned by this function doesn't always correspond to the internal indices used within the file. This is done to ensure that value 0 always corresponds to the ‘missing glyph’. If the first glyph is not named ‘.notdef’, then for Type 1 and Type 42 fonts, ‘.notdef’ will be moved into the glyph ID 0 position, and whatever was there will be moved to the position ‘.notdef’ had. For Type 1 fonts, if there is no ‘.notdef’ glyph at all, then one will be created at index 0 and whatever was there will be moved to the last index – Type 42 fonts are considered invalid under this condition.


FT_Get_First_Char

Defined in FT_FREETYPE_H (freetype/freetype.h).

  FT_EXPORT( FT_ULong )
  FT_Get_First_Char( FT_Face   face,
                     FT_UInt  *agindex );

Return the first character code in the current charmap of a given face, together with its corresponding glyph index.

input

face

A handle to the source face object.

output

agindex

Glyph index of first character code. 0 if charmap is empty.

return

The charmap's first character code.

note

You should use this function together with FT_Get_Next_Char to parse all character codes available in a given charmap. The code should look like this:

  FT_ULong  charcode;
  FT_UInt   gindex;


  charcode = FT_Get_First_Char( face, &gindex );
  while ( gindex != 0 )
  {
    ... do something with (charcode,gindex) pair ...

    charcode = FT_Get_Next_Char( face, charcode, &gindex );
  }

Be aware that character codes can have values up to 0xFFFFFFFF; this might happen for non-Unicode or malformed cmaps. However, even with regular Unicode encoding, so-called ‘last resort fonts’ (using SFNT cmap format 13, see function FT_Get_CMap_Format) normally have entries for all Unicode characters up to 0x1FFFFF, which can cause a lot of iterations.

Note that *agindex is set to 0 if the charmap is empty. The result itself can be 0 in two cases: if the charmap is empty or if the value 0 is the first valid character code.


FT_Get_Next_Char

Defined in FT_FREETYPE_H (freetype/freetype.h).

  FT_EXPORT( FT_ULong )
  FT_Get_Next_Char( FT_Face    face,
                    FT_ULong   char_code,
                    FT_UInt   *agindex );

Return the next character code in the current charmap of a given face following the value char_code, as well as the corresponding glyph index.

input

face

A handle to the source face object.

char_code

The starting character code.

output

agindex

Glyph index of next character code. 0 if charmap is empty.

return

The charmap's next character code.

note

You should use this function with FT_Get_First_Char to walk over all character codes available in a given charmap. See the note for that function for a simple code example.

Note that *agindex is set to 0 when there are no more codes in the charmap.


FT_Load_Char

Defined in FT_FREETYPE_H (freetype/freetype.h).

  FT_EXPORT( FT_Error )
  FT_Load_Char( FT_Face   face,
                FT_ULong  char_code,
                FT_Int32  load_flags );

Load a glyph into the glyph slot of a face object, accessed by its character code.

inout

face

A handle to a target face object where the glyph is loaded.

input

char_code

The glyph's character code, according to the current charmap used in the face.

load_flags

A flag indicating what to load for this glyph. The FT_LOAD_XXX constants can be used to control the glyph loading process (e.g., whether the outline should be scaled, whether to load bitmaps or not, whether to hint the outline, etc).

return

FreeType error code. 0 means success.

note

This function simply calls FT_Get_Char_Index and FT_Load_Glyph.

Many fonts contain glyphs that can't be loaded by this function since its glyph indices are not listed in any of the font's charmaps.

If no active cmap is set up (i.e., face->charmap is zero), the call to FT_Get_Char_Index is omitted, and the function behaves identically to FT_Load_Glyph.