Main Page
Report
Program
Session I
Session II
Session III
Session IV
Session V
Session VI
 
Intellectual Property Rights Page
 

For more information, please contact Mark Frankel.

 
Scientific Freedom, Responsibility & Law Program
 

"CITATIONS IN THE DIGITAL REALM"
Mats G. Lindquist
Lund University

ABSTRACT
From the starting point that science is a cumulative activity, where published works make up the the stock of scientific ideas, the scientific article is identified as the primary carrier of scientific results, and citations (or references) are seen as elements that make up the structure of the body of knowledge by linking  works to each other.

In the digital realm the scientifc journal is gradually being transformed and becoming less paper bound, which has implications for the bibliographic control of the journal as well as its constituent parts: the articles. This, in turn, has implications for how citations should be made since citations require a defined entity to link.

Not only is the concept of the scientific journal changing; the article is also subject to conceptual changes: one is the change towards greater modularity, meaning that the wholeness of the article will be broken into smaller parts; the other is that the boundary of the article will expand and include additions in the form of sound and video clips, as well as other forms of information. These changes will affect how citations should be designed.

Identification of the entities to be cited is one of the difficulties of citations in the digital realm. Several identifier systems are competing to become the established standard. Citation practice should strive to be in line with standards which have gained market acceptance.
 
The other major problem areas are longevity (or permanence) and authenticity. A "network of secure servers" is called for, but it is not easily achieved. In the meantime citation practice should  address the problem of longevity by adding redundancy in the linking, and address the problem of authenticity by adding (critical) content in the form of direct quotations.

1. Science as a cumulative process.
Scientific work is a continuous activity; the time span and the spatial reach of science transcends that of the individual researcher. In addition the nature of scientific inquiry is one of relating and positioning ideas, insights, and data.

In his classic work Science in History [1], J D Bernal describes the cumulative tradition of science:
"The methods of the scientist would be of little avail if he had not at his disposal an immense stock of previous knowledge and experience. None of it probably is quite correct, but it is sufficiently so for the active scientist to have advanced points of departure for the work of the future. Science is an ever-growing body of knowledge built of sequences of the reflections and ideas, but even more of the experience and actions, of a great stream of thinkers and workers." (p. 43)

2. Containers of ideas and scientific contributions.
The stock of scientific knowledge and experience is to a large part contained in books and journals and other types of scholarly writing. In the fields of science, technology, and medicine, STM, monographs have become much less important for scientific communication, and the journal article has become the main channel:

"[The scientific article] is the object around which the whole fabric of writing, publishing and reading is centered. Scientific articles are the prime representation forms for scientific information; they are closed entities, easily portable and well structured as a result of a century long tradition of scientific publishing." [4]

The scientific journal is then part of an institutionalized communication system, and the specific functions of the printed journal are:

  • to be a physical distribution package for articles,
  • to be a physical access point in the storage, i e the libraries' shelves,
  • to be a quality indicator (of varying level depending on reputation of the journal),
  • to provide a formalized means of collecting payments through subscription fees,
  • to give an overall indication of the subject matter of articles.

In this communication system libraries play a central role as a "store-and-forward" node.

3. Citations as links
Traditionally libraries have seen journals as fragmented books (monographs); "The Volume" was, and is, the basic unit for library thinking and operations. With the growth in importance of the scientific article as the fundamental unit for communicating scientific contributions, it became necessary to treat articles individually and have pointers on a deeper analytical level (in library terminology) and provide pointers to individual articles. To accomplish this it is not necessary to include the full bibliographic description of the journal, so various schemes for writing references were developed, partly inspired by the cataloging rules. Citations link ideas by linking the containers, i .e. the articles or works. Citations can be used for tracing particular ideas or facts, e g as part of a review or evaluation. They can also be used as an information retrieval method when searching for associated ideas by citation searching or by analyzing bibliographic couplings.

4. Electronic containers of scientific contributions.
Scientific communication is increasingly being conducted by electronic means. The existence of a global, easily accessible, telecommunications network, individual scholars can exchange ideas and take part of each other’s work electronically. The role of the paper bound article is diminishing; electronic publishing is growing fast.

The functional justifications for the printed scientific journal are under questioning: the physical packaging and the physical access in the shelves are clearly not needed for electronic information; the quality control can be made without print, and so can subject classification. Payment schemes will have to be revised to be reasonable in an electronic environment.

Scientific contributions ("articles") in electronic form do not have to be, and should not be, bound by the print tradition. So when leaving the traditional domain of print one challenge for the scientific community is to define the new (electronic) units of scientific contributions.

4.1. Descriptions of electronic entities.
The traditional cataloging rules are not adequate for describing electronic entities, see for example [8]. Concepts like "issue", "published", "pages" will change their meaning when print becomes electronic. Much scientific electronic publishing is still electronic versions of print, which obfuscates this fact, but it will be apparent with time.

There is also a need for new dimensions of description: IPR, Intellectual Property Rights, and technical dimensions of the entities, together giving requirements for licenses, hardware, and software for reading and viewing.

In parallel with efforts to bring (traditionally library) cataloging rules, such as AACR2, into the electronic era there are other initiatives to define schemes to describe electronic entities. The term "cataloging" has been replaced by "metadata" description [5]. One of these schemes is the "Dublin Core" which is sponsored and promoted by OCLC, and which has achieved considerable acceptance.

To make linking possible it is necessary to have something to point to; therefore the identification of entities is crucial. Preferably there should be a unique identifier for every entity.

4.2. Identification of electronic entities.
The enormous growth, and dominance, of the Internet and in particular the World Wide Web has made the web-address, the URL, almost synonymous with an identifier of an entity (or document). However, the URL is not acceptable as a reference point since it is not a stable one (for reasons that will not be discussed here). One attempt to bring stability to the URL approach has been introduced by OCLC:

"As a stopgap measure to address some of the problems with the persistence of URLs, about two years ago OCLC deployed a system called the PURL (Persistent URL). Basically, PURLs are HTTP URLs where the usual hostname has been replaced with the host 'PURL.ORG' and the filename is an identifier for the 'real' content being referenced. The PURL.ORG host will be maintained for the long term by OCLC under that name." [7]

Other approaches to establish a scheme to assign identifiers to electronic entities focus on the logical content, and not the physical location: there is the Uniform Resource Name, URN, which is being developed by the Internet Engineering Task Force, IETF.

There are also a number of initiatives from the publishing and media industries [2]: The Serial Item and Contribution Identifier (SICI); The Book Item and Component Identifier (BICI); The Publisher Item Identifier (PII); The Common Information System (CIS) and the International Standard Work Code (ISWC).

A scheme, which has received both attention and a beginning adoption, is the Digital Object Identifier (DOI) which is both an identifier and a routing system [2].

4.3. Implosion and explosion of the electronic scientific article.
The concept of the scientific article in the digital realm is subject to two fundamental changes: One is implosion, a burst inward, and can be discussed in terms of the granularity of scientific contributions; the other is explosion and can be discussed in terms of scattering.

The aspect of granularity is discussed by Kircz [4] who takes as a starting point the traditional article which is meant to be a "stand-alone" contribution, with content imported from cited sources; in the digital realm such imports are not necessary and can be replaced by hyper-links. An article might then start with a few links instead of, as is done in this article, a "stage-setting" quote. To a degree scientific writing is repetitious, and with an increased modularity in the contributions we might get a different anatomy of articles. In the digital realm focus can be more on new content and less on the composition and design of a whole treatise. (For teaching and study this raises other questions.)

The explosion of the article leads to scattering of contents due to "multimediality." The convergence of digital technology makes it natural to combine different types of information: text, moving images, and sound. The move towards electronic documents has been stronger in the STM fields, science, technology, and medicine, but the multimediality might well become even stronger in the social sciences, since social phenomena can, in general, better be described by using moving images and sound.

Multimedia documents that are "born digital" naturally appear as parts that are linked together, since the parts are usually produced by different tools, or are collected from different sources or data capture devices.

5. Electronic links
With the emergence of electronic documents there has been adaptations of the "rules and formats" of citations. Similar to the case with cataloging rules the citation "manuals" are modifications of their counterparts for traditional (print) media. There is an ISO standard: ISO 690-2, "Information and documentation - Bibliographic references - Part 2: Electronic documents or arts thereof." (http://www.nlc-bnc.ca/iso/tc46sc9/standard/690-2e.htm), and the established APA and MLA formats have also been adapted to electronic documents (see, for example http://www.researchpaper.com/writing_center/34.html and http://www.researchpaper.com/writing_center/33.html) [all three URL:s verified 1998-08-31].

The two changes in the anatomy of the scientific article (granularity and multimediality) make it, however, necessary to consider the identification question from a much broader viewpoint.

" - - - there are many things which the publishing industry can profitably learn from the unique identification schemes which the international music industry is adopting and much to be gained from working to develop at the least a compatibility of approach to the same or similar problems. - - -

Among these may be ways of tackling the problem of uncertain granularity. To what level of detail does content have to be identified?

The ISBN identifies the whole book; the SICI identifies the journal issue and, appropriately extended, the individual article within the issue. This may be enough for some uses but is clearly inadequate for others. If we are to be able to identify all rights owners in a particular piece of content, that may require a far finer degree of granularity of identification, to the level of the individual illustration or quotation from another source. Similarly, if information is to be traded with customers at a level of granularity finer than the 'chapter' or the 'article', then publishers may have compelling marketing reasons for being able properly to identify and to keep track of what is being traded. " [2]

The need for a broader approach is also pointed out by Lindquist [6], who brings in the aspect of, and the need for, convergence with the archival sector and with records management.

6. The major problem areas
The major problem areas for citations in the digital realm are both philosophical and physical. The philosophical problem is in defining the unit carrying scientific contributions; this is answering the question "What to cite" or "What to point to." The physical problems are those of longevity and authenticity.

By its nature electronic information is ethereal, but it does need a physical take-off from electronic equipment, and is bound by the laws of physics. To be interpretable electronic information is also bound by the electronic systems from which they originate, and the systems these in their turn can interact with. This is true in the case of online interaction as well as in the case where the electronic information is carried on a medium such as magnetic tape or CD.

The threat to longevity in the online case is primarily one of impermanence of organization (on the macro level it is the government or business enterprise; on the micro level it is the computer based systems). For information on tape and CD:s the major threat is technical obsolescence: that reading and viewing systems (software and hardware) will change in incompatible ways. The deterioration of the physical carrier is also a threat, but is comparatively easier to deal with.

In addition to the technical difficulties with obsolescence there is also the problem with IPR (intellectual property rights). A preservation policy of migration of the information to new systems will require the making of (digital) copies; permission to do so cannot be taken for granted.

The problem of authenticity is not unique to the digital realm: forgeries are legion in the print world, including made up scientific evidence. With digital information it is more difficult to discover forgeries since copies and originals are indistinguishable. Indeed one of the benefits of digital information is that it can be adapted to particular users and customized to individual tastes and needs. So digital information has this inherent "weakness" that has to be controlled for in some way.

7. Alternative scenarios
If we cannot trust the historical corpus of electronically recorded scientific knowledge we can use print as archival back-up and rely on traditional systems with libraries and archives to provide the "originals".

We can also give reviewers and judges a much stronger role and vouch for the soundness of the scientific contribution, based on the interpretations and judgements in their heads. Such an approach would be getting similar to an oral tradition of science, and would mean a reversal toward myth and authoritarianism. Shaman-like judges of science will hinder, or at least delay, new knowledge; paradigm shifts will be almost impossible. Print archives as a way out is impractical and expensive; when considering the volume of information they are not really feasible as a solution.So we have to work on developing a reliable structure and organization for citations in the digital realm.

8. Towards best practices
Clearly what is needed is a network of "secure servers" where electronic information can reside and be guaranteed to be unadulterated when accessed and extracted. Such a network will have to be set up in a large organizational context, maybe on a national level because the commitment must come from an authority with an expected life span of centuries. The implications for the organization and financing of such an endeavor are only just beginning to be discussed seriously. The roles of national authorities, universities, libraries, archives, and other parties are gradually being examined and considered.

The network of secure servers will take some time to come about; until then the best practices for citations in the digital realm must rely on self-sufficiency in the scientific contributions ("articles") and redundancy in the links.

Self-sufficiency in the contributions can be achieved by extended quotes (as illustrated in this article), or by deposit of hard-copies at an established caretaker, for example a university archive or library.

Redundancy in the linking of works can be achieved, for example, by adding access points to the individual scholar and his organization, both electronic and physical (the postal address). Some examples are given at the end of this article. This way alternative access ways are given for the links to ideas, works, and individual scientists; it is making connections between the electronic world to the world of atoms, molecules, and living things.

References:

[1] Bernal, J.D., Science in History [vol. 1], Cambridge, MIT Press, 1971 [originally published 1954].
[2] Green, Brian and Bide, Mark, "Unique Identifiers: a brief introduction", <URL: http://www.bic.org.uk/bic/uniquid> [accessed 1998-09-18]
[3] Harmsze, Frédérique-Anne P. and Kircz, Joost G., "Form and content in the electronic age", paper presented at the IEEE-ADL '98 Advances in Digital Libraries Conference. Santa Barbara, CA, USA, 22-25 April 1998.1
[4] Kircz, Joost G., "Modularity: the next form of scientific information presentation?", Journal of Documentation, vol. 54, no. 2 (March 1998), pp. 210-235.
[5] Koch, Traugott, "Description of the form and content of resources: Metadata" <URL: http://www.lub.lu.se/tk/metadata/metadata-general.html> [accessed 1998-09-18]
[6] Lindquist, Mats G., Digital library work: meeting user needs, <URL http://tiepac.portlandpress.co.uk/books/online/tiepac/session5/ch1.htm> [accessed 1998-09-12]
In: Ian Butterworth (Ed.), The Impact of Electronic Publishing on the Academic Community, London: Portland Press 1997, < URL http://tiepac.portlandpress.co.uk/tiepac.htm> [accessed 1998-09-12]
[7] Lynch, Clifford: "Identifiers and Their Role In Networked Information Applications" <URL: http://www.arl.org/newsltr/194/identifier.html> [accessed 1998-09-19]
[8] Shadle, Steven C., "A square peg in a round hole: Applying AACR2 to electronic journals", The Serials Librarian, vol. 33, no. 1-2 (1988), pp. 147-166.

Quotes:

From [6] Lindquist:
{Bibliographic control}
"’Find a model of description and organization that makes logical and physical access efficient and cost effective.’ (The term `bibliographic control' is not semantically correct in the digital order, but will do in the transition period.)

The current practice of describing electronic publications in a print-oriented description model such as the cataloguing rules will not be adequate for very long. There is a need for `EECR' - electronic entities cataloguing rules. A constructive initiative is the one by the (United States) National Institute of Standards and Technology to develop a ‘federal information processing standard (FIPS) for a data standard for record description records’, announced in the Federal Register 28 February 1995. This initiative also points to another necessary development: that of unifying descriptions in the library and in the archive world; provenance will, for example, be of increasing importance for library material (this issue is sometimes addressed in terms of `meta-information').

In a world of increasing co-operation and exchange it is not effective to develop local cataloguing rules, but to participate in international work on this. In the transition period it is necessary to find a balance between the efforts spent on describing the traditional form and the electronic forms. To catalogue electronic works is more expensive than for traditional ones, and yet is more important since an undescribed electronic publication is more difficult to handle (and can easily be unusable due to lack of description)."

[quote captured 1998-09-17]
Additional locating aids:
For [6] Lindquist:
Employer's web-page: http://www.lu.se
Work unit's web-page: http://www.ub2.lu.se
Private e-mail: mglindquist@hotmail.com
For [5] Koch:
Employer's web-page: http://www.lu.se
Work unit's web-page: http://www.lub.lu.se/netlab/index.html
For [7] Lynch:
Work e-mail: clifford@cni.org
For [8] Shadle:
Work e-mail: shadle@u.washington.edu