|
"CITATIONS IN THE DIGITAL REALM" ABSTRACT In the digital realm the scientifc journal is gradually being transformed and becoming less paper bound, which has implications for the bibliographic control of the journal as well as its constituent parts: the articles. This, in turn, has implications for how citations should be made since citations require a defined entity to link. Not only is the concept of the scientific journal changing; the article is also subject to conceptual changes: one is the change towards greater modularity, meaning that the wholeness of the article will be broken into smaller parts; the other is that the boundary of the article will expand and include additions in the form of sound and video clips, as well as other forms of information. These changes will affect how citations should be designed. Identification of the entities to be cited is one of the difficulties
of citations in the digital realm. Several identifier systems are competing
to become the established standard. Citation practice should strive to
be in line with standards which have gained market acceptance. In his classic work Science in History [1], J D Bernal describes the
cumulative tradition of science: 2. Containers of ideas and scientific contributions. "[The scientific article] is the object around which the whole fabric of writing, publishing and reading is centered. Scientific articles are the prime representation forms for scientific information; they are closed entities, easily portable and well structured as a result of a century long tradition of scientific publishing." [4] The scientific journal is then part of an institutionalized communication system, and the specific functions of the printed journal are:
In this communication system libraries play a central role as a "store-and-forward" node. 3. Citations as links 4. Electronic containers of scientific contributions. The functional justifications for the printed scientific journal are under questioning: the physical packaging and the physical access in the shelves are clearly not needed for electronic information; the quality control can be made without print, and so can subject classification. Payment schemes will have to be revised to be reasonable in an electronic environment. Scientific contributions ("articles") in electronic form do not have to be, and should not be, bound by the print tradition. So when leaving the traditional domain of print one challenge for the scientific community is to define the new (electronic) units of scientific contributions. 4.1. Descriptions of electronic entities. There is also a need for new dimensions of description: IPR, Intellectual Property Rights, and technical dimensions of the entities, together giving requirements for licenses, hardware, and software for reading and viewing. In parallel with efforts to bring (traditionally library) cataloging rules, such as AACR2, into the electronic era there are other initiatives to define schemes to describe electronic entities. The term "cataloging" has been replaced by "metadata" description [5]. One of these schemes is the "Dublin Core" which is sponsored and promoted by OCLC, and which has achieved considerable acceptance. To make linking possible it is necessary to have something to point to; therefore the identification of entities is crucial. Preferably there should be a unique identifier for every entity. 4.2. Identification of electronic entities. "As a stopgap measure to address some of the problems with the persistence of URLs, about two years ago OCLC deployed a system called the PURL (Persistent URL). Basically, PURLs are HTTP URLs where the usual hostname has been replaced with the host 'PURL.ORG' and the filename is an identifier for the 'real' content being referenced. The PURL.ORG host will be maintained for the long term by OCLC under that name." [7] Other approaches to establish a scheme to assign identifiers to electronic entities focus on the logical content, and not the physical location: there is the Uniform Resource Name, URN, which is being developed by the Internet Engineering Task Force, IETF. There are also a number of initiatives from the publishing and media industries [2]: The Serial Item and Contribution Identifier (SICI); The Book Item and Component Identifier (BICI); The Publisher Item Identifier (PII); The Common Information System (CIS) and the International Standard Work Code (ISWC). A scheme, which has received both attention and a beginning adoption, is the Digital Object Identifier (DOI) which is both an identifier and a routing system [2]. 4.3. Implosion and explosion of the electronic scientific article. The aspect of granularity is discussed by Kircz [4] who takes as a starting point the traditional article which is meant to be a "stand-alone" contribution, with content imported from cited sources; in the digital realm such imports are not necessary and can be replaced by hyper-links. An article might then start with a few links instead of, as is done in this article, a "stage-setting" quote. To a degree scientific writing is repetitious, and with an increased modularity in the contributions we might get a different anatomy of articles. In the digital realm focus can be more on new content and less on the composition and design of a whole treatise. (For teaching and study this raises other questions.) The explosion of the article leads to scattering of contents due to "multimediality." The convergence of digital technology makes it natural to combine different types of information: text, moving images, and sound. The move towards electronic documents has been stronger in the STM fields, science, technology, and medicine, but the multimediality might well become even stronger in the social sciences, since social phenomena can, in general, better be described by using moving images and sound. Multimedia documents that are "born digital" naturally appear as parts that are linked together, since the parts are usually produced by different tools, or are collected from different sources or data capture devices. 5. Electronic links The two changes in the anatomy of the scientific article (granularity and multimediality) make it, however, necessary to consider the identification question from a much broader viewpoint. " - - - there are many things which the publishing industry can profitably learn from the unique identification schemes which the international music industry is adopting and much to be gained from working to develop at the least a compatibility of approach to the same or similar problems. - - - Among these may be ways of tackling the problem of uncertain granularity. To what level of detail does content have to be identified? The ISBN identifies the whole book; the SICI identifies the journal issue and, appropriately extended, the individual article within the issue. This may be enough for some uses but is clearly inadequate for others. If we are to be able to identify all rights owners in a particular piece of content, that may require a far finer degree of granularity of identification, to the level of the individual illustration or quotation from another source. Similarly, if information is to be traded with customers at a level of granularity finer than the 'chapter' or the 'article', then publishers may have compelling marketing reasons for being able properly to identify and to keep track of what is being traded. " [2] The need for a broader approach is also pointed out by Lindquist [6], who brings in the aspect of, and the need for, convergence with the archival sector and with records management. 6. The major problem areas By its nature electronic information is ethereal, but it does need a physical take-off from electronic equipment, and is bound by the laws of physics. To be interpretable electronic information is also bound by the electronic systems from which they originate, and the systems these in their turn can interact with. This is true in the case of online interaction as well as in the case where the electronic information is carried on a medium such as magnetic tape or CD. The threat to longevity in the online case is primarily one of impermanence of organization (on the macro level it is the government or business enterprise; on the micro level it is the computer based systems). For information on tape and CD:s the major threat is technical obsolescence: that reading and viewing systems (software and hardware) will change in incompatible ways. The deterioration of the physical carrier is also a threat, but is comparatively easier to deal with. In addition to the technical difficulties with obsolescence there is also the problem with IPR (intellectual property rights). A preservation policy of migration of the information to new systems will require the making of (digital) copies; permission to do so cannot be taken for granted. The problem of authenticity is not unique to the digital realm: forgeries are legion in the print world, including made up scientific evidence. With digital information it is more difficult to discover forgeries since copies and originals are indistinguishable. Indeed one of the benefits of digital information is that it can be adapted to particular users and customized to individual tastes and needs. So digital information has this inherent "weakness" that has to be controlled for in some way. 7. Alternative scenarios We can also give reviewers and judges a much stronger role and vouch for the soundness of the scientific contribution, based on the interpretations and judgements in their heads. Such an approach would be getting similar to an oral tradition of science, and would mean a reversal toward myth and authoritarianism. Shaman-like judges of science will hinder, or at least delay, new knowledge; paradigm shifts will be almost impossible. Print archives as a way out is impractical and expensive; when considering the volume of information they are not really feasible as a solution.So we have to work on developing a reliable structure and organization for citations in the digital realm. 8. Towards best practices The network of secure servers will take some time to come about; until then the best practices for citations in the digital realm must rely on self-sufficiency in the scientific contributions ("articles") and redundancy in the links. Self-sufficiency in the contributions can be achieved by extended quotes (as illustrated in this article), or by deposit of hard-copies at an established caretaker, for example a university archive or library. Redundancy in the linking of works can be achieved, for example, by adding access points to the individual scholar and his organization, both electronic and physical (the postal address). Some examples are given at the end of this article. This way alternative access ways are given for the links to ideas, works, and individual scientists; it is making connections between the electronic world to the world of atoms, molecules, and living things. References: [1] Bernal, J.D., Science in History [vol. 1], Cambridge,
MIT Press, 1971 [originally published 1954]. Quotes: From [6] Lindquist: The current practice of describing electronic publications in a print-oriented description model such as the cataloguing rules will not be adequate for very long. There is a need for `EECR' - electronic entities cataloguing rules. A constructive initiative is the one by the (United States) National Institute of Standards and Technology to develop a ‘federal information processing standard (FIPS) for a data standard for record description records’, announced in the Federal Register 28 February 1995. This initiative also points to another necessary development: that of unifying descriptions in the library and in the archive world; provenance will, for example, be of increasing importance for library material (this issue is sometimes addressed in terms of `meta-information'). In a world of increasing co-operation and exchange it is not effective to develop local cataloguing rules, but to participate in international work on this. In the transition period it is necessary to find a balance between the efforts spent on describing the traditional form and the electronic forms. To catalogue electronic works is more expensive than for traditional ones, and yet is more important since an undescribed electronic publication is more difficult to handle (and can easily be unusable due to lack of description)." [quote captured 1998-09-17] |