Selecting a font for server side images based on available characters

There are many scripts available for rendering image headers or buttons on the server to ensure the correct font is displayed to the user. Typically these take at least two parameters: the text to be rendered, and the font to render them in.

Problems can occur (especially with multi-lingual sites) when the text to be rendered contains characters not available in the first choice font (such as Japanese or Russian text). Here is one solution:

For each font we want to use we need to extract the CMAP. This defines which character codes can be rendered in the font. The following Perl script (courtesy of David Chan) will take an true-type file and output a text file containing all the characters available:

#!/usr/bin/perl
 
use strict;
#use warnings; # Font::TTF::Font spews warnings, so we can't enable this
use Font::TTF::Font;
 
die "Usage: $0 file.ttf\n" unless 1 == @ARGV;
my $ttfFile = $ARGV[0];
my $f = Font::TTF::Font->open($ttfFile) or die "Cannot open $ttfFile: $!";
$f->tables_do(sub { $_[0]->read });
my @tables = @{$f->{cmap}{'Tables'}};
my %codepoints;
for my $table (@tables) {
    for my $codepoint (keys %{$table->{val}}) {
        my $glyphNo = $table->{val}{$codepoint};
        next if $glyphNo == 0;
        # 0 = unknown glyph. XXX should U+FFFD map to this? Don't care anyway
        $codepoints{$codepoint}++;
    }
}
 
binmode STDOUT, ":utf8";
printf "%s", chr($_) for sort {$a <=> $b} keys %codepoints;

This script is used like this:

 perl cmap.pl font.ttf > font.cmap.txt

Once you have generated CMAP text files for all the fonts you want to use, you can modify the font selection part of you image generation script, for example:

<?php
/* ensure mb_internal_encoding is set to UTF-8 */
 
/* returns true if all the characters in $text are available in the $cmap */
function in_cmap($cmap, $text) {
	$cmap = file_get_contents($cmap);
	for($i = 0; $i < mb_strlen($text); $i++) {
		$char = mb_substr($text, $i, 1);
		if(mb_strpos($cmap, $char) === false)
			return false;
	}
	return true;
}
 
/* returns the first fully compatible font from the array $font_list that can render $text */
function get_compatible_font($font_list, $text, $path = 'fonts') {
	foreach($font_list as $font)
		if(in_cmap("{$path}/{$font}.cmap.txt", $text))
			return "{$path}/{$font}.ttf";
	return "{$path}/arialuni.ttf";
}

This script assumes their is a folder that contains ttf/cmap pairs named font.ttf and font.cmap.txt. It also uses arialuni (Arial Unicode) as the ultimate fall-back font. Other unicode fonts could be used.

All scripts are Public Domain.

Comments

I got too much interesting stuff on your blog. I guess I am not the only one having all the enjoyment here! Keep up the good work
angry birds

Thanks for another informative web site. Where else could I get that type of info written in such an ideal way? I have a project that I’m just now working on, and I have been on the look out for such information.
Hill climb racing

technology is good for readers.

Side images based on available characters and this character is completely a new category work. We should try to make the things attractive according to the best assignment writing service blogs. Such things are always protective for us. So that blogs like this are interesting for us.

Add new comment