|
Beyond FITS - IAU General Assembly August 12 2000 Ed Shaya (Astronomical Data Center) |
The adoption of XML (eXtensible Markup Language) by the World Wide Web Consortium (W3C) and the commercial software community has revolutionized the field of data management and interchange. This new vehicle for data transfer is a crystalization of the experiences and lessons of several fields over several years, including: professional publishing in SGML, data and forms handling on the web, database query and distribution, and browser developments. The features include, first and foremost, widespread adoption and standardization of the specification, the parsers, and style presentation languages. Another feature is self-description in which a document links to the document type definition allowing for validation of structure. The document can link to applications needed to view itself or additional files that need to be automatically included.
Best practice of XML requires keeping a clear separation between information content and display styles. Documents contain data and informational tags that help to understand and locate the data, plus the document links to a separate document that provides display information that can depend on the properties of the media.
At the ADC we have been developing an XML language for astronomical data centers. Recently we started experimenting with an eXtensible Data Format (XDF) that can encompass a wide class of scientific data. The goal is to examine the features needed for a common interchange format between the various branches of science. XDF encompasses: complex hierarchical data structures, n-dimensional arrays merged with coordinate information, any dimensional tables merged with the field metadata, searchable and editable metadata, and extensibility to new features.
The rapid advance of computer science and technology is a serious
obstacle to the success of any data format.
What was optimal for data storage at
the last IAU General Assembly three years ago has been quite obsoleted by rapid
developments in the web, browsers, XML, Java, etc. And we need to face
the hard reality that what I say today will be obsolete by the next
meeting of the General Assembly. One can not freeze development and
ignore the technological advances, although that seems like the easy way,
because that way leads to being side stepped and forgotten as
enterprising people create their own data formats that do take
advantages of the latest and greatest technology. Rather, we need to
work together to constantly update and, if needed, migrate to new
formats, just as we have been doing all along when new storage media
appear.
I. Requirements for Data Formats of the Future
Basic Requirements
Understood for a 100 years.
Data Interchange
Data clearly embedded in coordinate space.
Encompass many fields of science.
Accomodate an object oriented approach.
Structured Data
Carry functions and variables?
Links, Variables, and Defaults
II. Advantages of XML
III. XDF
IV. Future of Data Formats
V. The Tower of Babel
Project PI:
Ed Shaya
XML Staff:
Jim Blackwell, Jim Gass,
Brian Holmes &
Brian Thomas
NASA Official:
Cynthia Y. Cheung