Producing screen-friendly output from LaTeX

Top

I want to produce screen-friendly output from LaTeX, with an auxiliary but important goal being that that screen-friendly output be ‘accessible’. This turns out not to be trivial. But I've managed to get a solution that I, at least, am happy with.

I asked a question on StackExchange, which includes some links to various alternative starting points. None of these worked quite as smoothly as I'd hoped.

The following notes – something of a brain-dump, but I hope useful to someone – are correct as of early 2021. They're broadly positive. There are a few things that don't work as well as one might hope, but things should improve, on that front, in time.

In the end, I did three things here.

Screen PDF

I adjusted the class file I'm using for these documents to produce an ‘ebook’ variant of the output. The key changes were:

In {memoir} terms, the layout I used was:

% the numbers here seem to work OK on an iPad,
% but there's no deep principle behind them.
\settypeblocksize{34\baselineskip}{25pc}{*}
\setlrmargins{*}{*}1
\setulmargins{*}{*}{1.5}
\setheadfoot{\baselineskip}{2\baselineskip}
\setheaderspaces{*}{*}{*}
\checkandfixthelayout

and I made sure the ToC was useful with \setcounter{tocdepth}4.

I made the page footer helpful about navigation, showing the current section number in the footer, with a link to the ToC, and feedback on how far into the document this page is. Expressed again in {memoir} terms, this is:

\makeoddfoot{<mypagestyle>}
  {\hyperref[foo@toc]{\rightmark}}
  {}
  {\thepage/\pageref{foo@lastpage}}

(that obviously depends on a \label{foo@toc} on the ToC page, and a \label{foo@lastpage} on the last page of the document).

I expected I'd have to work quite hard to get something decent here. But along with a small number of other class-specific layout tweaks, that was all I felt I needed to do to produce an adequately respectable screen-readable PDF (though as noted above not particularly accessible).

Via LaTeXML to EPUB

LaTeXML is the most successful of the LaTeX-to-HTML converters I've used, and seemed admirably robust, when used with the structurally simple but maths-heavy document I was working with. I managed to convert my LaTeX sources to EPUB in two related ways.

Using LaTeXML's EPUB generation pipeline is 90%+ successful. This still has a few bugs, and as of right now produces EPUB that works, but which doesn't pass the conformance checker I used (W3C, see below). But I was able to fix up the results without major difficulty, those bugs are currently reported in the LaTeXML developers’ queue, and so with time, and possibly with help from this community, they should disappear before too long.

What I in fact ran with was using LaTeXML's intermediate XML format (from which it generates HTML and related outputs), and develop my own XSLT stylesheets to convert this to XHTML, and to generate the associated EPUB metadata. This is a flexible technique, but it's obviously more work than something canned (I happen to have experience of this general route, so this was both relatively easy and quite enjoyable to do). I mention this, not because I'm necessarily recommending it, but in order to draw attention to that well-designed intermediate format, awareness of which might be similarly useful to others with special end-result needs.

Relevant resources

Spec:

Validators and good practice:

LaTeX and accessibility

PDF output from LaTeX scores terribly poorly on the ‘accessibility’ checkers I used; I got scores below 10/100 with some documents. It's not completely clear how that checker was scoring things, but in large part, it appears that a large part of the poor score is attributable to the outputs not being ‘tagged PDF’. There's a 2017 discussion about this on stackoverflow (A guide on how to produce accessible PDF files?).

There does not appear to be a simple answer to this question.

The future

The tagpdf documentation notes that:

I nevertheless think that the lua mode is the future and the only one that will be usable for larger documents. pdf is a page orientated format and so the ability of luatex to manipulate pages and nodes after the TeX-processing is really useful here

That is, there is significant structure in PDF tags that makes sense only after the TeX page-breaking algorithm has done its work, at a stage where (La)TeX no longer has any purchase. Thus the general problem is very hard; but even so, there might be some minimal structural-only tagging that it's possible to add (see pp.22 and 23 of the document, which you'd never want to type), and which might make a significant difference to an accessibility checker, if it's possible to automate at all.

The LaTeX3 project is interested in Tagged PDF, and one of the current goals there is to ‘provide functionality to automatically produce structured PDF, without the need for user intervention or post-processing’ (see 20:24 of a TUG 2020 talk by Frank Mittelbach). The implication of the rest of that talk is that this tagging is effectively infeasible with current LaTeX. Interestingly, they appear to have some modest funding from Adobe to do this.

Norman, 2021 May 16