Main Page
Report
Program
Session I
Session II
Session III
Session IV
Session V
Session VI
 
Intellectual Property Rights Page
 

For more information, please contact Mark Frankel.

 
Scientific Freedom, Responsibility & Law Program
 

MAINTAINING THE INTEGRITY OF ELECTRONIC PUBLICATIONS: POTENTIAL PROBLEMS AND POSSIBLE SOLUTIONS II
Janice L. Fleming
Cadmus Journal Services

PART 1 OF 2
PROPER LINKING PRACTICES TO IDENTIFY :
Without some central themes, guidelines, or standards, publishers and editors will create their own linking practices.  What may be helpful is if there can be some guidelines established for adoption by publishers so that the range of ad hoc practices is lessened.  The sooner these linking practices converge, the easier the information will become to navigate, and perhaps there will also be less time and cost needed in the future for correcting links.  This is a good goal - to make the information easy to navigate for the user and less costly to maintain. Here are some of the possible position papers and guidelines that come to mind.

  1. Statement conveying an expectation that content providers register and use DOI
  2. Statement conveying an expectation that authors cite using DOI
  3. Guidelines for linking to non-DOI data (databases, video, audio)
  4. Guidelines for forward linking (references, author's newer articles)
  5. Guidelines for re-locating or archiving old data and what to do about existing links
  6. Guidelines or expectations re linking to restricted (password protected) data sources
  7. Guidelines on what constitutes a citeable data source and expectations concerning the use of publisher-sponsored data depositories

LINKING QUESTIONS:
IS THERE TO BE ONLY ONE SITE THAT PROVIDES THE DEFINITIVE, CITEABLE VERSION OF EACH ARTICLE?
What the research community wants and needs is broad, unlimited access to research information. This suggests an article should not have to reside exclusively in one site.  Examples today are journals that are online both at society Web sites and at clustered sites like Ovid Technologies (a replicate copy of the article coded using Ovid's SGML DTD resides on the Ovid database). To have articles in multiple e-journal clusters will allow for the broadest reach.  Many of these clusters in the future may focus on helping researchers in specific disciplines locate relevant information.  And researchers in cross-disciplinary fields will benefit most if the information they need resides in the discipline-specific e-journal cluster they use.  Other types of e-journal clusters are being offered by the aggregator services that libraries use. So, if there are multiple copies of an article located in various Web sites, then which version does the DOI registry point to?  Just one? All of them?  How does the linking work in these instances?  What if the user has legitimate subscription access to the information housed in one cluster, but not the others?  How well can we serve the researcher's information access needs in terms of linking options? And, how does an author (or publisher) know what
to cite as the proper reference linkage if there's no DOI provided (and it may be quite some time before all publishers are using DOI as a matter of course).  Does it matter?

WHAT DOES AN ARTICLE LOOK LIKE IN 5 YEARS?  HOW HAS IT EVOLVED?
Is it a kernel with layers of added information around it?  Or is it an empty cell that is simply filled with a variety of interrelated and accumulating  files?  Here are some of the things we might find in an article "bucket."

  • Descriptive information about each author, their role in the research, their photograph, their list of publications
  • A video showing procedures, or experimental apparatus, or demonstrations, or animated modeling, etc.
  • An audio version of the article, perhaps recited by the author with additional commentary
  • Reviewer remarks
  • A community dialogue/debate about the article after its initial appearance
  • Metadata and profiling information for users
  • Equipment list and supplier information
  • Forward linkages

There are probably many more dimensions that can be added.  What's relevant at this time is to consider the linking that might be desired.  What additional guidelines concerning DOI registration and linking practices should be considered in view of our future.
 
PART 2.
VERSION CONTROL PRACTCES TO IDENTIFY
As with linking practices, there are version control practices that if used widely by publishers may help the entire research community better understand the source and nature of the information they encounter.  Is it peer reviewed or not?  Is it a corrected version?  Is it the definitive, unaltered version?  Here are some of the guidelines that might help steer the publishing community toward a common path of practices.

  1. Guidelines concerning version identity and labeling
  2. Guidelines for the use of encryption techniques (digital signatures, electronic watermarking)
  3. Guidelines concerning corrections

VERSION CONTROL QUESTIONS
WHAT IS THE EVOLUTION OF AN E-ARTICLE AND WHAT VERSIONS NEED TO BE CAPTURED AND LABELED?
The evolutionary track of an e-article can vary, but here is a possible basic track.  From this, we can identify the points at which an e-article needs a unique label so that readers will not grow confused or propagate incorrect/incomplete data.  Where should the label be located - near the title and author information?  At the beginning of the abstract?  It should be readily evident and perhaps become part of the header information (the title, authors, and abstract) that is so widely used to locate and screen information for further reading.

The evolutionary track of an e-article:

  1. Preprint version
  2. Peer review version(s)? there will often be more than one
  3. Accepted, unedited version
  4. Accepted, copyedited version? first "published" version
  5. Replicate site version (a copy of the "original", for example, the Ovid SGML version would represent a copy of the publisher's SGML or PDF versions)
  6. Corrected version(s)
  7. Extended information version (forward citations or other information added)

There has been some lively controversy about preprint versions and how they affect subsequent journal publication opportunities.  Nonetheless, preprint servers are an active part of the information distribution system in certain scientific disciplines.  The preprint, therefore, should constitute a version that needs some kind of identifier to ensure it is not confused with subsequent, peer-reviewed versions.  While under active peer review, manuscripts may undergo a number of version changes.  These are usually labeled in practice today, so continuing a similar pattern in a digital review process would be natural. The accepted manuscript that has not yet been copyedited is probably another version to distinguish.  There are some instances when unedited, pre-publication abstracts are being disseminated electronically, so to label this version may be helpful indeed. The accepted, copyedited version is what many feel constitutes the "real" published version.  Certainly any version ID system should clearly label this "definitive, original" published version. With multiple electronic delivery services, an e-article today may reside in multiple Web sites or databases.  Should there be a labeling system that distinguishes these replicate copies from the definitive, original source version? Corrected versions surely need to be identified, but should they also be highlighted in some fashion?  What about material that has been withdrawn - is it simply gone, or is its prior presence acknowledged? Is it citeable? What happens to the DOI?

WHAT DO WE NEED DIGITAL SIGNATURES AND WATERMARKING TO DO FOR US?
There are several information management needs that can probably be met by the use of digital signatures.

A - Ensuring a document has not been altered
B - Differentiating multiple versions from each other
C - Controlling document access when copied (such as for forwarded copies where the recipient does not have authorization/subscription to use the document)

Watermarking can be an effective source identifier for hard-copy printouts.  Watermarks can be added to PDFs, for example.