MAINTAINING THE INTEGRITY OF ELECTRONIC
PUBLICATIONS: POTENTIAL PROBLEMS
AND POSSIBLE SOLUTIONS II
Janice L. Fleming
Cadmus Journal Services
PART 1 OF 2
PROPER LINKING PRACTICES TO IDENTIFY :
Without some central themes, guidelines, or standards, publishers and
editors will create their own linking practices. What may be helpful
is if there can be some guidelines established for adoption by publishers
so that the range of ad hoc practices is lessened. The sooner
these linking practices converge, the easier the information will become
to navigate, and perhaps there will also be less time and cost needed
in the future for correcting links. This is a good goal - to make
the information easy to navigate for the user and less costly to maintain.
Here are some of the possible position papers and guidelines that come
to mind.
- Statement conveying an expectation that content providers register
and use DOI
- Statement conveying an expectation that authors cite using DOI
- Guidelines for linking to non-DOI data (databases, video, audio)
- Guidelines for forward linking (references, author's newer articles)
- Guidelines for re-locating or archiving old data and what to do
about existing links
- Guidelines or expectations re linking to restricted (password protected)
data sources
- Guidelines on what constitutes a citeable data source and expectations
concerning the use of publisher-sponsored data depositories
LINKING QUESTIONS:
IS THERE TO BE ONLY ONE SITE THAT PROVIDES THE DEFINITIVE, CITEABLE
VERSION OF EACH ARTICLE?
What the research community wants and needs is broad, unlimited access
to research information. This suggests an article should not have to
reside exclusively in one site. Examples today are journals that
are online both at society Web sites and at clustered sites like Ovid
Technologies (a replicate copy of the article coded using Ovid's SGML
DTD resides on the Ovid database). To have articles in multiple e-journal
clusters will allow for the broadest reach. Many of these clusters
in the future may focus on helping researchers in specific disciplines
locate relevant information. And researchers in cross-disciplinary
fields will benefit most if the information they need resides in the
discipline-specific e-journal cluster they use. Other types of
e-journal clusters are being offered by the aggregator services that
libraries use. So, if there are multiple copies of an article located
in various Web sites, then which version does the DOI registry point
to? Just one? All of them? How does the linking work in
these instances? What if the user has legitimate subscription
access to the information housed in one cluster, but not the others?
How well can we serve the researcher's information access needs in terms
of linking options? And, how does an author (or publisher) know what
to cite as the proper reference linkage if there's no DOI provided (and
it may be quite some time before all publishers are using DOI as a matter
of course). Does it matter?
WHAT DOES AN ARTICLE LOOK LIKE IN 5 YEARS? HOW HAS IT EVOLVED?
Is it a kernel with layers of added information around it? Or
is it an empty cell that is simply filled with a variety of interrelated
and accumulating files? Here are some of the things we might
find in an article "bucket."
- Descriptive information about each author, their role in the research,
their photograph, their list of publications
- A video showing procedures, or experimental apparatus, or demonstrations,
or animated modeling, etc.
- An audio version of the article, perhaps recited by the author with
additional commentary
- Reviewer remarks
- A community dialogue/debate about the article after its initial
appearance
- Metadata and profiling information for users
- Equipment list and supplier information
- Forward linkages
There are probably many more dimensions that can be added. What's
relevant at this time is to consider the linking that might be desired.
What additional guidelines concerning DOI registration and linking practices
should be considered in view of our future.
PART 2.
VERSION CONTROL PRACTCES TO IDENTIFY
As with linking practices, there are version control practices that
if used widely by publishers may help the entire research community
better understand the source and nature of the information they encounter.
Is it peer reviewed or not? Is it a corrected version? Is
it the definitive, unaltered version? Here are some of the guidelines
that might help steer the publishing community toward a common path
of practices.
- Guidelines concerning version identity and labeling
- Guidelines for the use of encryption techniques (digital signatures,
electronic watermarking)
- Guidelines concerning corrections
VERSION CONTROL QUESTIONS
WHAT IS THE EVOLUTION OF AN E-ARTICLE AND WHAT VERSIONS NEED
TO BE CAPTURED AND LABELED?
The evolutionary track of an e-article can vary, but here is a possible
basic track. From this, we can identify the points at which an
e-article needs a unique label so that readers will not grow confused
or propagate incorrect/incomplete data. Where should the label
be located - near the title and author information? At the beginning
of the abstract? It should be readily evident and perhaps become
part of the header information (the title, authors, and abstract) that
is so widely used to locate and screen information for further reading.
The evolutionary track of an e-article:
- Preprint version
- Peer review version(s)? there will often be more than one
- Accepted, unedited version
- Accepted, copyedited version? first "published" version
- Replicate site version (a copy of the "original", for example, the
Ovid SGML version would represent a copy of the publisher's SGML or
PDF versions)
- Corrected version(s)
- Extended information version (forward citations or other information
added)
There has been some lively controversy about preprint versions and
how they affect subsequent journal publication opportunities.
Nonetheless, preprint servers are an active part of the information
distribution system in certain scientific disciplines. The preprint,
therefore, should constitute a version that needs some kind of identifier
to ensure it is not confused with subsequent, peer-reviewed versions.
While under active peer review, manuscripts may undergo a number of
version changes. These are usually labeled in practice today,
so continuing a similar pattern in a digital review process would be
natural. The accepted manuscript that has not yet been copyedited is
probably another version to distinguish. There are some instances
when unedited, pre-publication abstracts are being disseminated electronically,
so to label this version may be helpful indeed. The accepted, copyedited
version is what many feel constitutes the "real" published version.
Certainly any version ID system should clearly label this "definitive,
original" published version. With multiple electronic delivery services,
an e-article today may reside in multiple Web sites or databases.
Should there be a labeling system that distinguishes these replicate
copies from the definitive, original source version? Corrected versions
surely need to be identified, but should they also be highlighted in
some fashion? What about material that has been withdrawn - is
it simply gone, or is its prior presence acknowledged? Is it citeable?
What happens to the DOI?
WHAT DO WE NEED DIGITAL SIGNATURES AND WATERMARKING TO DO FOR US?
There are several information management needs that can probably be
met by the use of digital signatures.
A - Ensuring a document has not been altered
B - Differentiating multiple versions from each other
C - Controlling document access when copied (such as for forwarded copies
where the recipient does not have authorization/subscription to use
the document)
Watermarking can be an effective source identifier for hard-copy printouts.
Watermarks can be added to PDFs, for example.