e-Print
The Electronic Archive
How
Physicists are using electronic preprint libraries
as
a model of how we will all publish and access
knowledge
resources in the future
pull quotes from articles by Paul
Ginsparg
World
Wide Information Systems are here
The
question is no longer whether
research literature will migrate to fully electronic dissemination,
but rather how quickly this transition will take place
"Report of the APS Task Force
on Electronic Information Systems"
( Bull. Am. Phys. Soc. 36 (1991) 1119 )
The dominant mode of dissemination will be via a single
electronic library, or database, which will be the heart of
a "worldwide information system." The characteristics of
this DB...
High quality, low-bandwidth, and standardized platform independent output
formats.
Currently the Los Alamos electronic
preprint service receives 18,000
submissions
per year
There is no peer review and to date there have been no abuses. Progressive
journals have begun to accept the archive identifier
as the electronic submission itself, and conduct their editor/referee interactions
as well by means of the version retrieved from the archive.
Indexing and maintenance is required to keep these living
research archives from becoming data
cemeteries. (don't store
what you need in a legacy data land-fill)
The paper journal system is an artifically partitioned
database. The unified global raw databsed of electronic preprints offers
dramatic improvements in flexability, more efficient
two-way transmission, quicker access to related research through hyperlinking,
semi-automated indexing, and better protection for authors. There needs
to be new types of metal level indexing and new intellectual
overlays to augment the filtering role of the traditional peer
review system.
The preferred electronic format 10 years from now won't
be any of the formats we use today. The current model of funding
publishing companies won't last.
Abstract
I describe a set of automated archives for electronic communication
of research information that have been operational in many fields of physics,
and some related and unrelated disciplines, starting from 1991. These archives
now serve over 35,000 users worldwide from over 70 countries, and process
more than 70,000 electronic transactions per day. In some fields of physics,
they have already supplanted traditional research journals as conveyers
of both topical and archival research information. Many of the lessons
learned from these systems should carry over to other fields of scholarly
publication, i.e. those wherein authors are writing not for direct financial
remuneration in the form of royalties, but rather primarily to communicate
information (for the advancement of knowledge, with attendant benefits
to their careers and professional reputations). These archives have in
addition proven equally indispensible to researchers in less developed
countries.
A major lesson we learn is that the current model
of funding publishing companies through research libraries (in turn funded
by overhead on research grants) is unlikely to survive in the electronic
realm. It is premised on a paper medium that
was difficult to produce, difficult to distribute, difficult to archive,
and difficult to duplicate -- a medium that hence required numerous local
redistribution points in the form of research libraries.
The electronic medium shares none of these features and thus naturally
facilitates largescale disintermediation, with the resulting communication
of research information both more efficient and more cost-effective. A
correctly configured fully electronic scholarly journal can be operated
at a fraction of the cost of a conventional print journal, and could for
example be fully supported by author subsidy (page charges or related mechanism,
as already paid to some journals), ideally allowing for free network distribution
and maximal benefit both to authors and readers.
The electronic medium should not be constrained by any former print
incarnation and, in particular, easily implemented
quality appraisal mechanisms in the electronic realm will be dramatically
superior to the binary (i.e. one-time, all-or-nothing) procedure employed
by the print medium, which in turn frequently conveys inadequate
signal. Moreover, authors and their funding institutions will be empowered
to insist upon retaining the right to distribute electronic research documents
and attachments in the format produced by the authors. Authoring
tools already allow a highly sophisticated end-user format, including automatic
network linkages, and will continue to improve.
The essential question at this point is not *whether* the scientific
research literature will migrate to fully electronic dissemination, but
rather *how quickly* this transition will take place now that all
of the requisite tools are on-line.
Secondary open questions include determining
-
the most effective means of cost recovery
for the disseminators of this information,
-
what agencies will be responsible for insuring the
-
long-term archival integrity,
-
indexing, and
-
cross-compatibility for the various research
databases, and
-
how peer review will be organized for those
disciplines that depend on the value-added it can in principle provide.
-
FAIR USE: I reserve the right to distribute this electronic document in
any way I so desire. It is publicly posted to the internet on my server,
and anyone is free to establish a link to it from a subsidiary server (but
not to copy it for public posting on a remote server, since that could
lead to an undesirable proliferation of obsolete versions). It should not
be reprinted for inclusion in any publication for sale without my explicit
permission.
Finally, I describe some of the major improvements, enhancements in functionality,
and other expansions projected over the next few years for the existing
archives.
e-Print Opportunities
In October 1994, the APS
(American Physical Society) hosted an "e-Print
archive workshop" at Los Alamos National Laboratory, in part to facilitate
its own entry into the electronic arena. Since then, the server network
based at Los Alamos has experienced
continued dramatic growth in both its breadth of coverage and worldwide
usage, and the arrival of NSF funding in the spring of '95 has meant as
well an interdisciplinary advisory board, full-time programming support
and significant improvements in functionality.
The archives now process many millions of electronic transactions of
all sorts per month, and the submission rate has doubled since Oct. '94
to an anticipated 18,000 new submissions during calendar year '96. The
physics community is rapidly moving to realize the vision for the future
expressed in the "Report of the APS Task Force on Electronic Information
Systems" (Bull. Am. Phys. Soc. 36 (1991) 1119): "The dominant mode
[of dissemination] will be via a single electronic physics library, or
Physics Database, which will be the heart of a worldwide Physics Information
System."
Much of the growth over the past
two years has been in areas of physics outside of the original core constituency
of high energy physics. For example the condensed
matter archive (cond-mat) has had its submission rate double during this
period to over 200 submissions per month, and sends daily "abstracts received"
listings to over 3000 registered e-mail subscribers. The astrophysics (astro-ph)
archive has similary doubled its submission rate to roughly 200 per month
and also sends its daily notifications to over 3000 subscribers. The continued
stability of the database has moreover led to increased archival usage
in all subject areas covered: the vast majority of requests are for papers
more than a month old, and over a third of the requests are for papers
more than a year old.
The archives coordinated from Los Alamos offer a
variety of choices of high quality, low-bandwidth, and standardized
platform independent output formats. Recent improvements in, and more widespread
usage of, end-user tools such as WorldWideWeb browsers have vastly simplified
both retrieval of information from, and submission to, the archives. Near-term
concerns have shifted to the continued development of a robust global mirroring
system, and to better means of handling meta-level
indexing information.
Additional mirror distribution sites (most recently added in France
and the U.K., joining the Italian, Japanese, and German mirrors; with additional
servers projected soon to go on-line in Sweden, Australia, Brazil, Taiwan,
and Russia) have given better response times, especially to international
users whose access is increasingly impeded due to network congestion caused
by recent increases in non-academic network traffic.
In the long-term, the mirrored distribution also provides a global backup
system resistant to localized database corruption and/or loss of network
connectivity.
The functionality of this unified global raw database offers
potential dramatic improvements over the research communication mediated
by the artificially partitioned database of the paper
journal system. In addition to the efficient two-way transmission
capabilities, as well as indexing and automated hyperlink
references within papers, the system has a password protection scheme
which allows authors to transfer ``ownership'' to any journal (or equivalent
third-party overlay) for the purpose of freezing the submission, stamping
a ``published'' reference, or incorporating errata/addenda (all by author/journal
negotiation). Original versions of modified papers are archived,
and even intermediate versions can in principle be reconstructed from a
series of replacements.
These global archives are not at all incompatible with the filtering
role historically provided by the journal system. To the contrary,
they beckon for learned societies such as the APS to augment their current
roles with new forms of intellectual overlays never before feasible.
The APS and other Physics Societies could further speed this development
by promoting a shared copyright scheme to their members, explicitly
allowing authors (and their institutions) to retain electronic full distribution
rights to documents as produced by the authors.
Publication and research habits will of course continue to vary
from scientific discipline to discipline, and even from subfield to subfield
of Physics, but the current framework is already flexible enough to accommodate
a variety of behaviors on the part of both authors and evaluators. The
majority of authors continue to submit in parallel with conventional
journal submission to take advantage of immediate distribution (and
de facto precedence claim), and subsequent revisions frequently benefit
as much or more from direct reader feedback as from the conventional
referee process. Some authors feel more comfortable submitting only after
a conventional refereeing process, with an attached "to
appear in" comment, still taking advantage of both the advance distribution
and archival availability. Certain journals have begun to accept the archive
identifier as the electronic submission itself, and conduct their editor/referee
interactions as well by means of the version retrieved from the archive.
Astrophysical Journal Letters (published by the American Astronomical Society)
actively encourages authors of accepted letters to place the "preprint"
of the final accepted version in the astro-ph archive. The identifying
number is then used to add a link directly to the astro-ph from a web
page with a list of letters that have been accepted but not yet published.
Physical Review D has similarly begun to add such link information to its
own web pages,
and in addition uploads directly to the archive information concerning
papers "to appear", and later their published status -- the information
is then available whenever users search the archive listings or browse
abstracts. Better coordination with the existing archives could provide
similar immediate benefits to readers of other APS journals.
At the APS workshop two years ago, it was emphasized how recent developments
had exposed the extent to which publishers had defined themselves in terms
of production and distribution, roles which we now regard as largely
automated. (For a complete and updated version of these comments, see
this Unesco presentation.)
The pressing need remains organization of intellectual value-added,
and this type of information can be overlayed on the global raw
archive and maintained by any third parties. The archive could be effectively
partitioned into sectors, gradated according to overall importance, quality
of research, or other useful criteria, and papers could be shifted retroactively
as dictated by additional information or follow-up research. And rather
than face only an undifferentiated bitstream, the average reader could
benefit from an interface that recommended a set of "essential reads"
for a given subject from any given time period. There could also be retroactively
added descriptive information, "this paper was important since it drew
upon a,b,c [hyperlinks to sources] and led to new developments x,y,z [more
hyperlinks]" to provide a further guide to the literature. Or the interface
could point to a specific paper as having been important, but warn the
beginner to go first to a later paper by the same (or other authors)
that subsumes, extends, or corrects the same results in a more understandable
fashion; or this paper generated much attention but skip it since the fad
played
itself out and people returned to more serious pursuits. Even interdisciplinary
research (for example if a particle physicist wished to peruse the recent
literature in biophysics or even biochemistry) can be easily facilitated
by an interface that allows rapid identification of papers that provide
pedagogic
review material or are otherwise likely to be of specific interest
to outsiders. Further possibilities such as moderated
comments threads attached to specific points in papers together
with more exotic features can be added in successive stages as desired.
At least the essential question at this point
is no longer whether the scientific research literature will migrate
to fully electronic dissemination, but rather how quickly this transition
will take place now that all of the requisite tools are on-line.
We eagerly anticipate a vastly improved and more useful electronic literature,
taking advantage of the flexibility afforded by the electronic medium and
unhindered
by artifacts of its evolution from paper. The APS and other Physics
Societies around the world should take advantage of the extent to which
the physics community has already jumped far ahead
of other research disciplines in all of this, and ideally the standards
set by this community can serve as a model
for the rest of scientific research communication.
Risks
We should also be alert to risks borne by authors who may find themselves
prematurely encouraged to abandon "chemicals adsorbed onto sliced processed
dead trees" in favor of an electronic-only archival
format. There is a certain leap of faith involved here, since every
once in a while one does after all get lucky and write a paper that could
still attract readership a century from now. The physical format, with
a worldwide system of institutional libraries serving as a multiply
redundant distributed archive, has proven robust on the timescale
of centuries to anything short of global cataclysm (in which case we'd
probably have more pressing concerns).
No current electronic format has proven similar longevity --- for the
simple reason that all have been in existence for little more than a decade
if that. Few claim to know what will be the preferred electronic format
a century from now, but some argue convincingly that it
won't be any of the formats we use today.
Just as endangered material on decaying acid paper is currently migrated
to microfilm, automated translation to newer and more general electronic
formats should always be possible during transition periods, provided there
is an acknowledged need to prevent our living research archives
from becoming data cemeteries.
(appeared in APS newsletter,
Nov '96) by Paul Ginsparg, Los Alamos Labs
Do we need Peer Review and
the Paper Journal?
The High TC field has, I guess, colored my view of the question of peer
review and made me perhaps even more of an agnostic. I know peer review
is regarded as Holy generally in the scientific community. But having watched
the field of High TC, which has been a very contentious field and a very
active field in the last while, I've become somewhat of an agnostic on
peer review, on the uses and abuses of peer review. There are three reasons
that are generally given for why we should do peer review.
The first one is validation. This says that refereeing somehow assures
the scientific soundness of work. Of course this only occurs to a certain
extent. Certain works get through which are not scientifically sound. We
all know that. And also, as it was pointed out earlier by Michael, the
number of papers that are actually rejected is relatively small. 20% maybe
are rejected out of the Physical Review but then a lot of them will appear
somewhere else. So actually it doesn't end up rejecting a lot of papers.
I think the point is that the real validation of work occurs in a different
way. Mainly, if an important result is claimed by somebody, people try
to reproduce it, to repeat it, to check the calculations, and that is how
work is validated, and that's the real validation process. What really
makes a scientific result important and validated is when it has been reproduced
and checked, and not really whether it has appeared or not appeared in
Phys Rev or another refereed journal. It's not the refereeing process really,
it's process of repeating it and checking it that shows it is a valid result.
The second thing that's also mentioned as a virtue of peer review is
that it improves the publication. Well, we again had a presentation from
Michael who presented numerical evidence that, in fact, the improvements
are generally small and relatively minor. And that tallies with my own
experience as a referee mainly, but also as an author. When some referee
tells me to change something, I usually do the minimum that will pacify
this referee and I think I'm not alone in that. So in many cases people
are minimalists.
So, I do see a future for journals. I think they will be a future as
compendia for more important papers in various forms so that they're, in
that sense, adding prestige. I think people would be interested in having
collections (something like the Physical Review Letters) of papers that
are considered the most prestigious, to browse through at a later stage.
So, I think there will be a continuing role for journals but I do think
we should try to take advantage of the new methods of assessing papers
that the electronic E-print revolution offers to us.
Address by Dr. Paul Ginsparg, Los Alamos Labs
Formatted by Dr. Steve Bett, ECRC-Lamar University
The Journal of High Energy Physics is one publication that
works
closely with an archive. Its Web site includes a "mirror," or
complete copy, of a popular physics electronic archive located at
Los Alamos National Laboratory, in New Mexico. The author of a
paper in that e-print archive can submit it for publication in the
journal by filling out a form on the Web and supplying the paper's
identification number in the archive. The "overwhelming majority" of
the papers published by the journal have come from the archive,
says Mr. Amati, the journal's project chairman.
Marketing also plays a role in attracting both submissions and
readers. "You've got to get out there and advertise it and sell it,"
says Mr. Brown, of the British physics institute. The institute has
bought advertisements in physics periodicals and has sponsored
receptions at conferences of physicists to promote the New
Journal of Physics.
Journals that are not backed by a publisher may not have the
resources to beat the bushes for submissions. For example, the
Journal of Interactive Media in Education, an electronic-only
publication based at the Open University, in Britain
(http://www-jime.open.ac.uk/),
received about a dozen
submissions last year, says Simon Buckingham Shum, one of the
journal's editors and a researcher at the university's Knowledge
Media Institute. But he and the rest of the journal's editors, all
of
whom are volunteers, haven't had the time this year to solicit
manuscripts aggressively. Without a publisher to maintain the
journal's visibility, submissions to it have dropped off, he says.
"When we get busy, the journal suffers."
Another crucial factor is whether an electronic journal is included in
the indexes that are popular with academics in a given field. If the
journal is not indexed, scholars may never find papers it has
published that are relevant to their research, and other researchers
will not be interested in submitting work to it.
Particularly important is coverage by the Institute for Scientific
Information, a Philadelphia-based company that indexes about
8,000 journals it deems to be the most important. It produces
electronic data bases and printed volumes indicating which papers
have been cited by other authors, and it compiles statistics that are
meant to reflect the impact of various journals in a field. The data
bases also are used to produce statistics that estimate the influence
of academic departments at universities.
"I.S.I. is the one that we feel we really need to capture," says Mr.
Seitter, of the meteorological society. "Until Earth Interactions
shows up there, it's going to be hard for authors to get the kind of
credit they need for their publications."
The company monitors 15 journals that are disseminated only in
electronic format, says a spokeswoman, Jacqueline H. Trolley.
They include two from M.I.T. Press -- the Chicago Journal of
Theoretical Computer Science and Studies in Nonlinear
Dynamics and Econometrics -- and journals from a range of other
fields, including New Astronomy, Postmodern Culture, and
Sociological Research Online.
As it does when judging whether to include a printed journal in its
data base, the company looks for evidence of high academic
quality in electronic journals that it is considering, Ms. Trolley says.
However, it has adapted some of its criteria: Rather than insisting
that an on-line journal publish regular issues, the company will
cover an electronic publication if it disseminates new material at
least once every six months.
Other indexes also cover electronic journals. For example, the
Chemical Abstracts Service, operated by the American Chemical
Society, follows 30 electronic-only journals. |