FreeType & GSoC

The FreeType project wants to become part of Google Summer of Code. Here is our ideas list.

Improve fuzzing for FreeType

There are at least two fuzzers that constantly test FreeType. One of them is OSS-Fuzz, and the tasks for GSoC presented here are targeted to increase the efficiency of this fuzzer bot.

Split the existing fuzz target into many

Right now we have src/tools/ftfuzzer/ftfuzzer.cc that contains

extern "C"
int LLVMFuzzerTestOneInput(const uint8_t* data,
                           size_t size_)
{
  ParseWhateverFontFileIGet(data, size);
}

Instead of this monolithic approach we should split fuzzing into separate files (fuzz targets) for every font format, for example

src/tools/ftfuzzer/cff_fuzz.cc
src/tools/ftfuzzer/cff2_fuzz.cc
src/tools/ftfuzzer/cid_fuzz.cc

and every such file will have

extern "C"
int LLVMFuzzerTestOneInput(const uint8_t* data,
                           size_t size_)
{
  ParseOnlyMyFormatAndRejectAnythingElseQuickly(data, size);
}

Ideally, the build rule for cff_fuzz will not link anything that CFF does not need.

Such a split will make fuzzing more efficient for many reasons.

  • Genetic mutations will not spend time crossing over files of different formats (e.g., trying to add BDF genes to a Type 1 font).
  • Data-flow guided mutations will not try to transform, say, a CID font file into a PCF font file.
  • Some of the fuzzer's internal algorithms that are linear by the code size will run faster.
  • Slow inputs that currently make fuzzing inefficient will cause only some of the targets suffer, not all of them.

The changes will need to be reflected in the OSS-Fuzz repository.

Prepare a public corpus of inputs

The quality of the ‘seed corpus’ is the key to fuzzing efficiency. We should set up a repository (e.g., in github) that would hold

  • small but representative sample font files for every relevant font format (only with permissive licenses!), and
  • fuzzed mutations of the above (this part will need to be periodically updated as fuzzing finds more inputs).

This corpus will be used in two ways, namely

  • to seed the fuzzing process, and
  • as a regression suite (see below).
Extend the FreeType testing process to use the above corpus

The public corpus will allow us to use the fuzz targets as a regression test suite. We'll need to set up a continuous integration testing (not fuzzing) to run the fuzz targets on the corpus. One way to achieve it is to have a github mirror of FreeType and set up Travis (or whatever other CI integrated with github).

Analyze code coverage

Once the fuzz targets are split, the public corpus is prepared, and the OSS-Fuzz integration is updated, we'll need to analyze the code coverage provided by OSS-Fuzz to see what code is left untested.

Then either the fuzz targets or the corpus (or both) will need to be extended to cover that code. The ideal end state is to have 100% line coverage (currently, we have ~67% for the existing fuzz target).

Prepare fuzzing dictionaries for the font formats where relevant

In some cases a simple dictionary (list of tokens used by the file format) may have dramatic effect on fuzzing.

Difficulty: medium. Requirements: C, C++, Unix build tools, experience with scripting. Potential mentors: Kostya Serebryany (Google), Werner Lemberg (FreeType).

Develop a test framework for checking FreeType's rendering output

Right now, FreeType's rendering results of the current development version are not systematically compared to a baseline version. This is problematic, since rendering regressions can be very easily missed due to subtle differences.

The idea is to select a representative set of reference fonts from font corpora (which already exist mainly for fuzzing). The fonts are used to produce glyph images for various sizes and rendering modes (anti-aliased, B/W, native hinting, auto-hinting, etc.). FreeType can already produce MD5 checksums of glyph images as part of its debugging output; these values should be compared against a baseline version of rendering results. If there are differences, HTML pages should be generated that contain comparison images of the baseline's and the current development version's rendering result, ideally indicating how large the differences between the images are by using some yet to be defined measure.

Difficulty: medium. Requirements: C, Unix build tools. Potential mentors: Werner Lemberg, Alexei Podtelezhnikov, Toshiya Suzuki (FreeType).

Improve the ‘ftinspect’ demo program

Right now, FreeType comes with a suite of small graphic tools to test the library, most notably ‘ftview’ and ‘ftgrid’. The used graphics library, while working more or less, is very archaic, not having any comfort that modern GUIs are providing.

To improve this, a new demo program called ‘ftinspect’ was started, based on the Qt GUI toolkit. However, the development is currently stalled, mainly for lack of time.

The idea is to finish ftinspect, handling all aspects of the other demo programs. Currently, it only provides the functionality of ‘ftgrid’.

If the student prefers, the Qt toolkit could be replaced with GTK.

Difficulty: medium. Requirements: C, C++, Qt, Unix build tools. Potential mentor: Werner Lemberg (FreeType).

Do you have more ideas? Please write to our mailing list so that we can discuss your suggestions, eventually adding them to the list!

Last update: 6-Feb-2017