Approaches to accessible mathematical text

Top

The following are notes on various approaches to the problem of producing mathematical text online, in a way which is accessible to folk with one or other visual disability. The motivating use-case is that of getting maths-heavy lecture notes online, and the discussion is addressed to people with at least some familiarity with LaTeX.

Unfortunately, I don't believe that there is a single best answer here. None of the three options below is unequivocally the best, nor obviously bad.

The three approaches below are:

LaTeX to EPUB

An EPUB file is a ‘website in a box’: it's a collection of HTML files, with some standard added metadata, bundled into a zip file, and named something.epub rather than something.zip. The key question is: how do you go from the LaTeX you (by hypothesis) already have, to the HTML?

The fact that EPUB is HTML is important, because HTML is easy to make accessible: with CSS styling it's possible to change the appearance of the document (font size, style, and colour, page layout, and so on); the document structure is manifest, so it gives assistive technologies something to work with; and it is friendly to screen readers, for users with significant visual problems.

Hasn't this been done before?

There are multiple tools for turning LaTeX into HTML. In the past, I've used at least LaTeX2HTML, tth, TeX4ht, and LaTeXML. All of these have their virtues, which I'm not going to discuss, but it's the last one that I've been most impressed by, in terms of its fundamental design, the comprehensiveness of its coverage, and the possibilities for extending it when it runs out of steam.

Plus, it can produce EPUB without effort.

LaTeXML

LaTeXML is a spin-off from the DLMF (ie, Abramowitz and Stegun: TNG). It parses the TeX at a fairly low level, which means that it is capable of being extended by a range of LaTeX packages (just like ordinary LaTeX), and for those cases where a package does something LaTeXML can't cope with, it can be extended relatively easily (that extension is done in Perl, and is not for the faint-hearted, but I mention it to note that it means the system has scope to deal with even rather challenging source documents).

Significantly, LaTeXML reads LaTeX maths and converts it to MathML, as opposed to converting it to images. MathML is now well-supported in browsers, and systems like MathJax can help render it in those remaining cases where it's not supported.

The end result doesn't look as good as the PDF version – typographically, it's respectable but not great. Also, it's possible you'll have to make some layout adjustments, since the text will potentially be displayed in an unpredictably narrower ‘page’ than in the paper version. But that's OK, since I for one expect to distribute the EPUB as an alternative to the PDF version of the notes, rather than as a replacement.

The current version, as of September 2021, is v0.8.5. It's under active development.

Using it

The LaTeXML web pages give some installation instructions. Depending on your preferred platform, that step will range from easy to a pain in the neck.

Using LaTeXML is potentially as simple as

% latexmlc --pmml --split --splitat chapter --destination notes.epub notes.tex

on an unedited version of your source .tex file. Although the tool is generally well documented, for this driver program you'll (currently) need to consult the output of latexmlc -h.

Environment variables such as TEXINPUTS and BIBINPUTS work as usual. You may need to include --includestyles if you have private packages that you use; or add your own CSS styling with --css; and with --svg you can even include {picture} environments.

You can see the result of this command on a thinned-down version of my A2 lecture notes, and in a screenshot below. Those parentheses aren't quite the right size and the linebreak after the section heading is poor, but it's readable, and all of the cross-references work (looking at this in a web browser looks better, to me). The ‘See Exercises...’ text is formatted using the same LaTeX \newcommand as for the PDF version of this document.

EPUB screenshot

EPUB screenshot

The actual version of that EPUB, that I will be distributing this coming year, is produced by a slightly different process (for reasons...); there's a fair amount of scope for tweaking the process, but if the tool works, it seems to me to work robustly.

If you run into difficulties, then it might be because you're using LaTeX packages that perplex the tool. If those packages are related to layout, then you might as well skip them. I find it convenient to have two main LaTeX files, one for ‘normal’ PDF output and one for LaTeXML, both of which \input the common body of the text, but the latter of which loads only a minimal set of packages.

Success and failure

I've experimented with a small range of documents, both mine and other people‘s. Documents which are structurally similar to an article or a book seem to do well (and it copes with a surprisingly large range of extra packages); documents which are not, produce erratic results. Beamer presentations don't seem to be handled with success, even in article mode, though I haven't dug into whether this is for superficial or deep reasons.

The EPUB that LaTeXML produces (as of v0.8.5, in September 2021) has a few technical glitches, but nothing that seems likely to disturb a real-world reader.

There is a range of EPUB readers. I'm familiar with the one bundled with macOS, called Books, and I've found Calibre to work OK on macOS, but I haven't been systematic in surveying alternatives; I don't have any recommendations for other platforms, but I'm aware there are multiple alternatives to choose from.

A few details, for completeness

I'll also mention that I'm fairly uninhibited about extending LaTeX with my own macros and packages; that meant I'd given myself a little work to do (but less than I expected) to adapt those packages to LaTeXML (plus use of the --css and --includestyles options to latexmlc).

If you happen to be interested in exploring further (and you needn't), then you might want to play with the W3C's EPUBCheck validator, and, for extra credit, with the DAISY Consortium's accessibility checker. The latter tool is concerned with Web Content Accessibility Guidelines (see the WCAG group, and the associated standard). The gold standard here is to make a claim of WCAG2 AA conformance. That's what LaTeXML is aiming for, but I get the impression you can be quite a way off that and still be doing respectably well in accessibility terms.

Markdown to HTML and EPUB

Same destination, different start-point.

Markdown is a way of formatting relatively simple texts with unobtrusive markup. The idea of Markdown is that text is annotated with unobtrusive markup, of the sort you might see in a plain text email anyway.

The previous paragraph started off as:

[Markdown](https://en.wikipedia.org/wiki/Markdown) is a way of
formatting relatively simple texts with unobtrusive markup.  The idea
of Markdown is that text is annotated with _unobtrusive_ markup, of
the sort you might see in a plain `text` [email][rfc] **anyway**.

[rfc]: https://www.ietf.org/rfc/rfc822.txt

That's a large fraction of what you need to know about Markdown syntax. It's used all over the place, and is a good solution to an important problem.

But the fearless forces of progress will never be held back by mere success!

Although it's simple at its heart, and in its original conception, Markdown now comes in numerous variants, not all of them mutually compatible but including, crucially, some implementations which can process LaTeX maths. In particular, the pandoc converter can process a substantially extended Markdown syntax – specifically including $maths$ – and turn it into HTML or EPUB. If you put the above paragraph into a file try-pandoc.txt, and

% pandoc -o try-pandoc.html try-pandoc.txt

you'd get an HTML version of the text.

If you have a structurally simple document, this is great. My experience converting a more complicated document – specifically one with a fair density of internal cross references and some custom formatting – is that it rapidly became rather messy, and I lost sight of the simplicity of Markdown.

RMarkdown – not just the command line

Where this approach really scores is if you use RMarkdown as a front end. This lets you do two important things:

In Markdown terms, however, RMarkdown is ‘merely’ a front-end to pandoc so the pandoc manual is the authority on the precise Markdown variant that's supported.

Depending on your starting point, it might also be easier to install, and doesn't require familiarity with the command line (I admit, though, that my experiments here have been with Pandoc, and I haven't actually played with the RStudio editor).

LaTeX to accessible PDF

The third possibility is to generate PDFs from LaTeX, but make the PDFs ‘accessible’, in a way that gives respectable scores from the Blackboard Ally tool that my institution uses.

‘The problem with [Quantitative Easing] is it works in practice, but it doesn’t work in theory.‘

[according to Ben Bernanke, US Federal Reserve Chairman during the 2008–09 financial crisis.]

In principle, this is doomed to failure (for the reasons why, see below). But in fact it works much better in practice than in principle: significantly better than one might expect, and indeed works better than it has any right to!

You might get away with including

\usepackage[tagpdf]{axessibility}

in your LaTeX preamble, and you might additionally have to include

\RequirePackage{pdfmanagement-testphase}
\DeclareDocumentMetadata{}

before the \documentclass line. If it compiles without error, then it has probably worked (you shouldn't see any visible difference in your document), and you should get a better score on Blackboard Ally or similar.

But:

Why the reservations?

Well, since you asked...

The Problem with PDF from this point of view, is that it is a page description language: its job is to put dots on paper or on the screen. It's fundamentally a visual format. Usually, it will do that job using pre-prepared libraries of dots (this is known as that exotic thing, a ‘font’), but there ends up being a great deal of freedom in what a PDF writer actually does. The pdftex engine doesn't, for example, include space characters in the output, simply positioning successive words with suitably-sized gaps between them – this is why sometimes, when you cut and paste some text from a PDF, it appears as a continuous block of unspaced text (this is what a screen-reader would see, too).

Similarly, a PDF writer doesn't necessarily include text within the PDF file in the same order it appears on the printed page; this would confuse any screen reader. I think this is unlikely to happen for most LaTeX text, but it illustrates the depth of the problem.

PDF writers usually do their work in such a way that a screen-reader can unscramble the egg. They do this part-heuristically and generally quite successfully, but this becomes challenging for maths, or images. What also complicates things is that all text is simply text, whether it appears in a heading, the body text, or the page number, and a screen-reader will read out all of those things as equivalent. The way that problem is addressed is by ‘tagging’.

A ‘tag’ in PDF terms, is a label attached to a block of content, which labels it as, for example, paragraph text, or a level-n heading, or a page footer, in such a way that a screen reader (or similar) can provide a table of contents for the document, so that the human reader can skip through it, or at least distinguish headings from body. Tags also let a screen reader work through the content in reading order rather than file order (where they differ), and avoid reading out the page-number and footers at all.

The core of the work done by the axessibility package, and the tagpdf package it depends on, is to add basic tagging to at least the sectioning macros in your document – \section and so on – and copy the LaTeX code for equations into the document in a way a screen reader can see. This is such a big deal that this by itself appears to be what significantly increases the PDF's score in Blackboard Ally-type tools.

But...

But this whole process is fragile, since there is a fundamental mismatch between the requirements of the PDF format and what LaTeX is currently able to do. The LaTeX kernel is being changed to support tags better: see the announcement and summary, and a detailed discussion of the problem (p.26) by Ulrike Fischer, the author of tagpdf. The results are due to appear in a couple of years from now, but the axessibility mechanism will remain on the edge of breakage at least until then.

A more fundamental problem is that I don't really know if what this package does is useful. It does get tags into PDFs, and that does increase Blackboard Ally scores, but I'm not sure how useful a block of read-aloud LaTeX code is to a student, and others have commented that the results of this process are not particularly useful in practice. I'm trying to find out more details about the extent to which this suspicion is true or false, and I'll aim to update this post with what I find out.

Discussion

My strongest suggestion remains: use LaTeXML. This is the solution I'm going to use this coming semester, distributing EPUB alongside my PDF notes (certainly not instead of the PDF). Despite its apparently clear success here, I'm puzzled that I don't see EPUBs mentioned more often in these contexts. It's true that LaTeXML requires a bit of effort to set up, but with that done EPUB seems robust, and the results look OK to me.

My big problem is that I don't really know just what counts as ‘OK’ for someone who actually depends on this technology. That's what I need to find out about.

Norman, 2021 #f 2