What interoperability do we want to achieve and how might we know when we have achieved it? I think the VO should be able to do the following tricks:
"Understanding the results" implies that there exist
These criteria in turn imply a language for expressing queries that is richer than SQL or a GET CGI-call.
Dealing with non-standard query-parameters -- i.e. those that don't satisfy the above criteria -- implies a language for describing these parameters that can be passed through a system to a UI and there generate a form for a human user to fill in.
The unified column descriptions devised at CDS are a set of machine-readable, standard names for quantities that can be columns of results, and, presumably, parameters in queries. The UCD namespace is rich, and is extensible through being hierarchical. This is exactly what we need.
However, not all the names in UCD have obvious, unique semantics. An example is magnitudes and colours. PHOT_INT-MAG_V is integrated or isophotal V-magnitude, clearly, but which kind of V? PHOT_JHN_V is definitely Johnson V-magnitude, but how does this interact with the PHOT_INT-MAG series? PHOT_JHN_V_I is "Johnson" V-I colour; but which I band is implied? For automatic handling, all these aspects need to be made explicit.
We need a sub-set of UCD that is to be understood by VO software, and for that sub-set we need to define the semantics. We may find that the standard set of UCD names includes some new ones.
UCD does not define the units of measurement. A UCD label for a quantity needs to be qualified with a statement of the units.
AstroRes is an XML vocabluary for describing and encapsulating tables of results. It defines a decriptive header for a table in a way analogous to a FITS header for a FITS table. The actual data of the table can be included either as a compact ASCII representation in CSV format or as individual elements of XML. Alternatively, the AstroRes header could be stored and transmitted separately from the data.
AstroRes descriptions can identify columns by UCD, can attach human-readable labels and descriptions, and can specify units.
AstroRes is used in Vizier (it is one of number of possible output-formats, and about one third of all requests ask for AstroRes), by OASIS, and at HEASARCH.
AstroRes seems to be trying to do exactly what we need according to the criteria at the start of this paper. The fact that it is widely used in getting data from Vizier implies that it works. Whether it works well enough in the context of a full VO is not known yet.
The encoding of metadata in AstroRes is good; the encoding of the actual data is weaker. How well can it handle vast tables? (E.g. could it handle sensibly the transfer of an entire survey from one site to another?) Does it need a binary representation for data? What tools exist to parse and use the format?
AstroRes is likely to be superseded by VOtable (see below).
VOtable (no reference available yet) is an expansion of AstroRes. The main changes concern the data component:
The initial paper describing VOtable suggests that it could be used to define queries as well as results. This could be done by annotating the XML header and sending it with no accompanying data-component. However, the existing VOtable vocabulary does not seem to me to allow proper expression of queries; some expansion would be needed.
VOtable might become a joint standard for AstroGrid, AVO and NVO. However, the current paperwork is not precise enough to allow adoption as a formal standard.
The Astronomical Server URL (ASU) from CDS is a proposed standard for the CGI interface to a data-service. It defines an exact syntax for
ASU seems to satisfy the criteria for allowing any archive to accept a query in standard language. However, there are problems:
The variety of syntax is a historical accident. ASU is the union of some syntaxes used for a number of archives.
It seems to me that ASU, in its current form, isn't complete or flexible enough to help the VO very much. If data centres choose to implement ASU interfaces, then it makes interoperability a little easier, but the gain isn't enough to make it worth imposing ASU on all data-centres.
At present, ASU is a loose convention, not a formal standard. If we wanted to make it a standard, then we would have to refine it. At present, if a data-service claims conformance to "ASU", then a client has no way of knowing which parts of the syntax it supports and hence what queries might be acceptable. To resolve this, ASU needs to be broken down into uniquely-named profiles for each of which there is only one syntax. Any given data-service can then state which of the profiles it supports and client software can use this to format queries reliably. Ideally, the global VO should pick exactly one profile as the favoured standard and lobby for all archives to move to this standard.
GLU, the Generateur de Liens Uniformes from CDS, serves two purposes. Firstly, it provides indirection for URLs such that the physical URL of a data-service can while the public URL stays the same. Secondly, it can alter the syntax of a URL that is a call to a CGI interface, thus rearranging a query to suit the conventions of the data service.
The first feature is useful, but outside the scope of this paper. The second feature might satisfy the need for all data-services to accept queries on standard parameters. That is, the GLU server might be arranged to accept all standard query-parameters and to map them to the syntax of the individual data-services. I have too little experience with GLU to know if it powerful and flexible enough for this job.
Mocha, Middleware based On a Code sHipping Architecture from the University of Maryland, is a software system that implements a different way of defining standard quantities in queries and results. Instead of using a de jure definition in a language like XML, it defines quantities by their implementation as Java objects. An object can represent a specific query-parameter if it implements a general interface for query parameters. The actual code for these objects is serialized and passed between programmes in the system. Hence, the queries and results become self-defining at a deep level.
I have not experimented with MOCHA; I do not know how well it works. I mention it only as an example of a radically-different approach.
This is a list, probably rather incomplete, of things that might be search constraints on a data grid. It came out of a brain-storming session between N. Walton and G. Rixon.
I believe that we should define an XML vocabulary to describe queries. It must serve two purposes: to express queries to the grid and to data-services; to express possible queries to the user interface in order to get human intervention with the difficult cases.
The queries in the hypothetical language must be transformable into SQL at the data centres. For ease of use, they should also be transofrmable to ASU in the grid.
The highest priorities as I see them are enabling queries according to brightness and colour or objects -- i.e. according to the spectral characteristics -- and making it easy to do good overplots of catalogue on images. These were the things most wanted in the portal experiment carried out at CASU in June to October 2001.
I think the quantities we most need to standardize are to do with spectrophotometry. I use "spectrophotometry" to cover SED measurements with both imaging and spectrographic instruments, and I believe that we need common representations for data from both classes of instrument.
We need a standard description of wavelength coverage. This will probably become part of the static description of data resources.
We need a way of transforming all spectrophotometry onto a common scale. Transforming everything to flux density is the obvious first step.
We need a standard description of the type of spectrophotometric data. Data that are true Sloan phtometry, say, need to be labeled as such so that they can be correctly transformed. Data that are derived measures (e.g. Sloan magnitudes synthesized from measured spectra) also need to be distinguished.
Celestial position is fairly well covered by existing convention. We need to add sufficient standard quantities to support overplotting of arbitrary data on images, as in Aladin, Gaia etc.
Where possible, the overplots should be drawn as ellipses. This means that we need standard quantities to give the position angle, ellipticity, and ellipse width. The width (or half-width or whatever) might not be a physical parameter of the object, but might be derived from magnitude.