Common Structure of Statistical Information (CoSSI)
Documentation and DTD's
- Publication DTD, version 1.6 (ZIP) - Includes publications, statistical tables and matrices, statistical metadata and classifications
- Publication DTD, version 1.0 (ZIP)
- DTD's for statistical tables and matrices are also on the CoSSI-PXML page
- Documentation of the CoSSI model is based on the DTD version 0.9: CoSSI Definition Descriptions v. 09.2003 - Basic CoSSI description with models (PDF)
Other papers on CoSSI
- Publishing Metadata with Data - XML Based Dissemination Process of Statistical Information (COSSI) (PDF)
- Dissemination of Statistical Data and Metadata - Process Based on Common Structure of Statistical Information (CoSSI) (PDF)
- Structuring Statistical Information - Basic idea how to organize statistical data and its basic formats (PDF)
- Alternative approach to metadata - Theoretical background of metadata (some points) and models for metadata (PDF)
- Final Demonstration Report on Taxation Metadata in Secondary Data Collection - How to connect the metadata of taxation to numeric taxation data and use them at the same time (PDF)
- Conceptual Modelling of Administrative Register Information and Xml - Taxation Metadata as an Example (PDF)
A Short Introduction
We have gathered into the CoSSI model the content definition descriptions we have drawn up for statistical information. While compiling the definitions we have not only relied as much as possible on the existing, general solutions but also on those applicable to statistical information and its processing.
The basic definitions presented in this report cover the basic forms of organising statistical information, inclusive of its matrix and table format structures, and specifications of the metadata that are required to describe it. In addition, these basic definitions have been drawn up for the data that are needed for file identifications and descriptions. Identifying and describing files is imperative both in the production and in the dissemination of statistical information. The information used to describe and manage the production processes of statistical information, as well as application-specific information concerning processing have been viewed as falling outside the scope of these basic definitions.
Inially, when SGML (Standard Generalised Markup Language, ISO 9970:1986) was confirmed as the standard, it implicated organising statistical information so that structured technologies could be used in its production and dissemination. Work on this was started at Statistics Finland in 1993. However, the emphasis in this work soon fell on the theoretical aspects of the technologies while less attention was paid to their application. The focus of interest was statistical information itself, on what in the final analysis it is - what we actually mean when we talk about statistical information.
In practice, many different ways for organising and describing statistical information were run into. In terms of their patterns of thought they represented views of the essence of statistical information that were highly divergent and even quite conflicting. The common aspect of the patterns was a spendthrift attitude to statistical information. They squandered information at the different phases of production and dissemination. This observation lead to the conclusion that a uniform and uniting, conserving and sparing structural definition of statistical information was necessary. Furthermore, it would have to be drawn up so that it could be used both in the production and in the dissemination of statistics.
The drawn up structural definitions have been gathered into one report to stimulate debate about them and their possible usage. This discussion should initially be conducted so that the current application development instruments of statistical organisations are not allowed to impose limitations on the defining or structuring of statistical information.
This report is comprised of the logical models of the structural definitions and their content definition descriptions. The logical models are also presented as tree structures that combine the simplicity of the description technology sufficiently illustratively with the described information structures. The tree structures have been implemented using the notation defined by XML (Extensible Markup Language 1.0, W3C Recommendation 10-February-1998). The content definition descriptions are verbal transcripts of the tree structures. Condensed versions of the models are presented in the Appendix of this report as DTD (Document Type Definition) descriptions complying with XML.
The models and their corresponding descriptions have been grouped into modular entities that can also be used as independent structural models. Information is given against each model about the extent and the ways solutions produced for other than statistical purposes have been exploited.
The development work connected with the structuring of statistical information was initially done by Statistics Finland's Methodological Unit and, later on, in co-operation with the dissemination sub-project of the agency's production model project. Over the years, several employees of Republica Oy and Citec Oy have also participated in the work, for which we wish to express our gratitude.
Heikki Rouhuvirta & Harri Lehtinen
Last updated 1.8.2008