Producing screen-friendly output from LaTeX

Top

I want to produce screen-friendly output from LaTeX, with an auxiliary but important goal being that that screen-friendly output be ‘accessible’. This turns out not to be trivial. But I've managed to get a solution that I, at least, am happy with.

I asked a question on StackExchange, which includes some links to various alternative starting points. None of these worked quite as smoothly as I'd hoped.

The following notes – something of a brain-dump, but I hope useful to someone – are correct as of early 2021. They're broadly positive. There are a few things that don't work as well as one might hope, but things should improve, on that front, in time.

In the end, I did three things here.

• I adjusted the class file I developed for the relevant documents, to create a B6-sized document, which works when viewed on-screen as PDF. This works pretty completely, and produced something I'm content with, at the cost of less work than I anticipated. However it is generally not ‘accessible’, in the sense of being friendly to, eg, text-to-speech readers, and other tools to support folk with print disabilities (supporting whom was a part of my motivation here).
• I've used LaTeXML to produce EPUB documents. Amongst other virtues, these tick a number of ‘accessibility’ boxes. This mostly works, and the problems are relatively minor bugs, and aesthetic points which one could busy oneself tweaking more or less indefinitely.
• I've learned a little more about the general accessibility of LaTeX documents (short version: much harder than you expect now, but potentially due to improve).

Screen PDF

I adjusted the class file I'm using for these documents to produce an ‘ebook’ variant of the output. The key changes were:

• Single-sided B6 paper looks like the right pre-set size for this (I leant heavily on the {memoir} class, and used its option b6paper, but the {geometry} package would doubtless be able to hand this, too).
• Font: I used 10/12pt STIX2.
• Lots of extra white space where it was at all reasonable. Specifically, I redefined \section and \subsection (using \@startsection) to add a \newpage before each. I added extra vertical stretchability and negative vertical penalties everywhere at all reasonable.

In {memoir} terms, the layout I used was:

% the numbers here seem to work OK on an iPad,
% but there's no deep principle behind them.
\settypeblocksize{34\baselineskip}{25pc}{*}
\setlrmargins{*}{*}1
\setulmargins{*}{*}{1.5}
\checkandfixthelayout

and I made sure the ToC was useful with \setcounter{tocdepth}4.

I made the page footer helpful about navigation, showing the current section number in the footer, with a link to the ToC, and feedback on how far into the document this page is. Expressed again in {memoir} terms, this is:

\makeoddfoot{<mypagestyle>}
{\hyperref[foo@toc]{\rightmark}}
{}
{\thepage/\pageref{foo@lastpage}}

(that obviously depends on a \label{foo@toc} on the ToC page, and a \label{foo@lastpage} on the last page of the document).

I expected I'd have to work quite hard to get something decent here. But along with a small number of other class-specific layout tweaks, that was all I felt I needed to do to produce an adequately respectable screen-readable PDF (though as noted above not particularly accessible).

Via LaTeXML to EPUB

LaTeXML is the most successful of the LaTeX-to-HTML converters I've used, and seemed admirably robust, when used with the structurally simple but maths-heavy document I was working with. I managed to convert my LaTeX sources to EPUB in two related ways.

Using LaTeXML's EPUB generation pipeline is 90%+ successful. This still has a few bugs, and as of right now produces EPUB that works, but which doesn't pass the conformance checker I used (W3C, see below). But I was able to fix up the results without major difficulty, those bugs are currently reported in the LaTeXML developers’ queue, and so with time, and possibly with help from this community, they should disappear before too long.

What I in fact ran with was using LaTeXML's intermediate XML format (from which it generates HTML and related outputs), and develop my own XSLT stylesheets to convert this to XHTML, and to generate the associated EPUB metadata. This is a flexible technique, but it's obviously more work than something canned (I happen to have experience of this general route, so this was both relatively easy and quite enjoyable to do). I mention this, not because I'm necessarily recommending it, but in order to draw attention to that well-designed intermediate format, awareness of which might be similarly useful to others with special end-result needs.

Relevant resources

Spec:

Validators and good practice:

LaTeX and accessibility

PDF output from LaTeX scores terribly poorly on the ‘accessibility’ checkers I used; I got scores below 10/100 with some documents. It's not completely clear how that checker was scoring things, but in large part, it appears that a large part of the poor score is attributable to the outputs not being ‘tagged PDF’. There's a 2017 discussion about this on stackoverflow (A guide on how to produce accessible PDF files?).

There does not appear to be a simple answer to this question.

• One answer to the stackoverflow question points to a google code project, which of course has disappeared, but sounds like it might be related to a set of templates for ‘SIGCHI’ (which starts off saying ‘This repository was...’), which points in turn to ACM article templates which may or may be useful. Unfortunately, the only mention of accessibility on the templates page is an exhortation to produce figure descriptions, which doesn't appear to be the principal problem.

• Another answer in the stackoverflow discussion points to the accsupp package, which adds alt-texts, and the pdfcomment package, which does... something else (along with bringing in a fearsome set of package dependencies). One of the comments points to the axessibility package, which is concerned with formulae. Finally, there's a page at CTAN covering access to various PDF features, none of which is obviously structural tags.

• Although the accessibility package exists, its github page has a wide variety of disclaimers on it, going as far as saying ‘I’d like to discourage people from using the package any more’. I wonder, however, however, if there's an 80-20 solution here, in the sense that if only the structural tags are available, then do we get a much higher score? The issues list for that package suggests that its author is interested in finding development money to get someone to work on this (issues last updated mid-2020).

• There is a thing called PDF/UA, and a TUG talk about it.

• Most significantly, there is a package tagpdf by Ulrike Fischer (CTAN and github), which is billed as being ‘to experiment with tagging with pdflatex and lualatex’ (last contributions appear to be late 2019). It starts off by saying: ‘This package is not meant for normal document production.’ and notes that it requires a current expl3 version of LaTeX3. It suggests that the various accessibility packages in LaTeX are fundamentally flawed, to the extent that they rely on monkeypatching LaTeX rather than using a PDF API provided by the LaTeX3 kernel.

The future

The tagpdf documentation notes that:

I nevertheless think that the lua mode is the future and the only one that will be usable for larger documents. pdf is a page orientated format and so the ability of luatex to manipulate pages and nodes after the TeX-processing is really useful here

That is, there is significant structure in PDF tags that makes sense only after the TeX page-breaking algorithm has done its work, at a stage where (La)TeX no longer has any purchase. Thus the general problem is very hard; but even so, there might be some minimal structural-only tagging that it's possible to add (see pp.22 and 23 of the document, which you'd never want to type), and which might make a significant difference to an accessibility checker, if it's possible to automate at all.

The LaTeX3 project is interested in Tagged PDF, and one of the current goals there is to ‘provide functionality to automatically produce structured PDF, without the need for user intervention or post-processing’ (see 20:24 of a TUG 2020 talk by Frank Mittelbach). The implication of the rest of that talk is that this tagging is effectively infeasible with current LaTeX. Interestingly, they appear to have some modest funding from Adobe to do this.

Norman, 2021 May 16