Quote:
Originally Posted by sofasurfer
I'd take slight exception to your implicit assertion that HTML is a suitable standard for long-term access and archival
|
I don't think it is a suitable standard either, but not because it's browser specific - after all, any programming language worth its salt already has a good HTML parser (my personal favourite is the elegant, malform-tolerant
BeautifulSoup library for Python).
The real problem with HTML is that it's designed for human-readable documents, not for data. Dumping structured data into tables or definition lists with defined id and class attributes is just too sloppy-joe for a clear, consistent, and
maintainable API.
I personally dislike XML mainly because it's so needlessly verbose (and also, to be honest, because it's abused so horribly and so often), which is why I prefer something like YAML or JSON. XML does have nice built-in transformations via XSLT; but again, any decent programming language can easily parse structured data into human-readable presentation formats.
Quote:
Originally Posted by sofasurfer
IIRC, didn't PDF become an open standard not so long ago?
|
You're right: it was opened formally in January 2008 and given an ISO standard later that year. [Aside: There's actually a
Javascript library for dynamically generating PDFs that looks quite promising.]
However, there are two big problems with PDF as a data standard:
- PDF is specifically designed for printed documents, so it's even less suited to structured data than HTML. The set of available objects are oriented toward 2-D presentation on a page, not toward data linking.
- The standard is a bit baroque (mainly to provide the ability to position elements on the page and to include both vector and raster graphics).
In any case, I certainly can't imagine any credible argument in favour of using PDF to facilitate programming access to structured data rather than a format that's actually designed and intended for this purpose.