A quantity-based XML-based query layer for the VO: the NOAO NX system implementing a catalog model


Brian Thomas, Edward Shaya, Zenping Huang

University of Maryland

Rafael Hirart

NOAO


Abstract


We present a working prototype system query system which demonstrates how VO data-models may be used as the underlying basis for a VO-wide query. This system, NX ("New XML query"), overlies, and extends the W3C standard for query on XML documents, XQuery ([1], we refer to this extended version as "XQuery+"). The NX system is designed to allow query on data described in terms of VO-wide data models which are built up from the VO Quantity [2].


We have extrapolated a "VO Catalog" model for the purposes of this demo, and can use this model to query data which are stored in one or more SQL data bases using essentially nothing more than XQuery+. Our demo is implemented on top of a test data-base holding SuperMacho variable star catalog data from NOAO [3]. In this poster we will describe the contents of the NX (“New XMLQuery”) system, how it may be deployed, and future efforts for development. All NX software is GPL licensed and freely available for use and/or modification.


Introduction/Statement of Problem:


The use of SQL or SQL-like mechanisms for description and

retrieval of information from VO repositories is inherently limited to table-like structures which have little attached meta-data. Rather this information must be stored in separate structures (tables) in which the inferred meta-data is stored.


XML in tandem with XQuery offers the opportunity for the VO to make queries which can return objects which contain richly described data. By describing data as XML documents, complex meta-data may be stored in-situ, easily associated with the data to which they pertain. Furthermore, the use of searchable XML allows us to use the Quantity, a scientific data container, for storage of the data we wish to store. Unfortunately XQuery lacks many of the sophisticated numerical/string search capability of SQL queries. In this work, we demonstrate how this capability may be implemented, as a middle layer which we call NX, and the additional advantages which are thereby obtained.

NX System Architecture:




Figure 1: NX layer in system architecture. NX layer translates application layer requests into local schema and appropriately accesses, then returns requested data. Red boxed numbers indicate sequential activity of NX layer working to satisfy a query. Grey lettering indicates the interface language (and data-model, in square brackets) which is used at each step. Explanation of steps:


  1. Application makes query in form of XQuery+ using the external Data-model [eDM]


  1. NX layer XQuery+ Parser translates the query into vanilla XQuery and interrogates the XML database. An XML document, which conforms to the internal (iDM) schema, is returned, and packaged with needed instructions for use by the XMLAssembler.


  1. The XMLAssembler is passed the XML document, which it begins to parse and (re-)assemble as external data model. As it encounters instructions within the document placed there by the XQuery+ Parser, it follows, then removes them.

  2. Some instructions direct the XMLAssembler to query external resources like an SQL database. It packages returned values within XML nodes.

  3. Assembled XML document, in the 'public' external format, is returned back to the application.

NX Query Interface:


The query interface is the combination of the XQuery+ language and a data-model, expressed in XML, which has nodes which inherits/aggregates VO:Quantity.


XQuery+ Language:


XQuery+ is the same language as XQuery, but adds a new function “data_select” which may be added to the “return” portion of an XQuery statement, and has the format:


data_select { variable-expression

[ WHERE limit-expression ]

[ ORDER BY xpath-expression ]

[ LIMIT BY xpath-expression ]

[ OFFSET start ]

}


where :


variable-expression may be either an XQuery variable (e.g. “$field”) or an xpath-expression.


condition-expression is the same as an SQL condition clause but replacing the column name with an xpath-expression.


xpath-expression is XPATH 2.0 [6]




Data Model:


We have shown UML diagrams for brevity. Each class more or less represents an element in the XML schema. Aggregation represents the relationship between element nodes, with parent nodes being the parent class. Interfaces represent possible choice of node at that point in the object tree. Thus, this model allows one of 2 types of “Values” container to be held within a Quantity. Inheritance represents a restriction relationship between nodes. Color in these diagrams represents the namespace of the class/node. The demo NOAO namespace classes are layered on top of the VO namespace classes.



Figure 1. Complex Meta-data within the top NOAO:Holding class.




Figure 2. Selected classes from the demo Data model. The “external”, public model [eDM in figure 1] differs from the private, “internal” model [iDM in figure 1] only in that “values” appear to be explicitly held by the document. In reality, they are held within one or more SQL databases, and these are described by the internal model “values” shown in figure 3. Color denotes the namespace of the class/interface.



Figure 3. Values interface. Explicit values appear in the external data-model [eDM], SQLvalues by the internal data-model [iDM].

Query Demonstration:


Is built on top of a subset of the SuperMacho variable star catalog data [3] and is available online at:


http://archive.astro.umd.edu:8080/nx/blue.jsp


We have used the version 1.0b eXist [4] XML database to hold the XML data, and version 7.4.2 PostgreSQL [5] to hold the SQL tables. We have tested several configurations where NOAO:field data where held in SQL table columns held on different machines, and NOAO:catalog structures where different member field structures pulled together data from separate SQL tables.


The XML and SQL Schema, the XML serialization used in this demo may be found at:

http://archive.astro.umd.edu/nx/





Figure 4. Example online query application. The above query requests fields in the SuperMacho variable star catalog which have the name “period_r” and return the values in those objects ordered by sibling fields which match the string UCD “pos.eq.ra”



Figure 5. Result of query in figure 4.

NX Query capabilities:


Table 1 . NX implementation of Selected Important SQL Capabilities




Added SQL Capability

XQuery+ Supports?

Yes No

SELECT all values in object

X





SELECT/WHERE



based on value in object

X


based on value in sibling objects of same type

X





ORDER BY

X





LIMIT

X





OFFSET

X





DISTINCT*

X

X




CREATE VIEW


X




SCHEMA


X


* From the XQuery language, but not within data_select function. Note that comparable JOIN, UNION, INTERECTION abilities are also already supplied by XQuery.

Discussion:

Have shown that legacy SQL databases may be easily wrapped in a searchable XML-based layer. The advantages of this system over SQL or SQL-like access are :

Table 2: Advantages over SQL or SQL-like access


* Can store complex meta-data aside values. There is no need to be limited to simple “table” hierarchy, or complex schemes for storage of associated meta-data in separate tables.

* Virtual “tables” (objects) are easily supported as we can easily map existing, separate SQL database tables, as desired, into a publicly “unified” structure.

* Returned Information has scientific values wrapped within the VO:Quantity -- thus we have a dependable, scientific, structure for our data for use in web applications.

It is important to note that the XQuery+ language is NOT designed for a human scientist to use directly. It IS one for machine to machine discourse, and should underpin an application which the human may use to generate the query. VOQL, a human readable/usable language could be layered on TOP of the NX layer using VO-wide data-models and ontologies to represent the data. Thoughts on how this may be achieved appear in the poster by Shaya etal. at this conference.


Possible improvements that we are considering for the NX middleware layer include:

Software Resources

NX Query Demonstration:

NX software and this poster paper:

http://archive.astro.umd.edu/nx/



References


[1] W3C XQuery :

http://www.w3.org/XML/Query/


[2] VO Quantity

http://ivoa.net/twiki/bin/view/IVOA/IVOADMQuantityWP


[3] SuperMacho Variable Star Catalog:

http://store.anu.edu.au:3001/cgi-bin/varstar.pl


[4] eXist XML database homepage:

http://exist-db.org/

[5] PostgreSQL database homepage:

http://postgresql.org/


[6] W3C XPATH :

http://www.w3.org/TR/xpath/