Steve Harnad on the difference between open access to code, to text, and to data

Steven Harnad, an open access publishing expert, gets worked up about the conflation of the meaning of open and free, as used in software, with open and free as used in text, and then again, in data. These distinctions are valid, and no doubt important in certain applications of his field.

But I would still argue that broadly, culturally, politically, even though the usage of a free code, free text and free data may indeed be different; there is still a community of interest, as the common point is: free/open access to the cultural creations of mankind, so that they can be re-used, reworked.

Here is the full-text by Steven Harnad, of which I excerpted the following:

“Stevan Harnad :

It would be a *great* conceptual and strategic mistake for the movement dedicated to open access to peer-reviewed research (BOAI)
to conflate its sense of “free” vs. open” with the sense of “free vs. open” as it is used in the free/open-source software movements. The two senses are not at all the
same, and importing the software-movements’ distinction just adds to
the still widespread confusion and misunderstanding that there is in
the research community about toll-free access.

I will try to state it in the simplest and most direct terms possible:
Software is code that you use to *do* things. It may not be enough to
let you use the code for free to do things, because one of the things you
may want to do is to modify the code so it will do *other* things. Hence
you may need not only free use of the code, but the code itself has to
be open, so you can see and modify it.

There is simply *no counterpart* to this in peer-reviewed research
article use. None. Researchers, in using one another’s articles, are
using and re-using the *content* (what the articles are reporting), and
not the *code* (i.e., the actually words in the text). Yes, they read the
text. Yes (within limits) they may quote it. Yes, it is helpful to be able
to navigate the code by character-string and boolean searching. But what
researchers are fundamentally *not* doing in writing their own articles
(which build on the articles they have read) is anything faintly analogous
to modifying the code for the original article!

I hope that that is now transparent, having been pointed out and written
in longhand like this. So if it is obvious that what researchers do with
the articles they read is not to modify the text in order to generate a
new text, as programmers may modify a program to generate a new program,
then where on earth did this open/free source/access conflation come from?

And there is a second conflation inherent in it, namely, a conflation
between research publishing (i.e., peer-reviewed journal articles) and
public data-archiving (scientific and scholarly databases consisting of
the raw and processed data on which the research reports are based).

Digital data archiving (e.g., the various genome databases, astrophysical
databases, etc.) is relatively new, and it is a powerful *supplement*
to peer-reviewed article publishing. In general, the data are not *in*
the published article, they are *associated with* it. In paper days, there
was not the page-quota or the money to publish all the data. And even
in digital days, there is no standardized practice yet of making the raw
data as public as the research findings themselves; but there is definite
movement in that direction, because of its obvious power and utility.

The point, however, is this: As of today, articles and data are not
the same thing. The 2,000,000 new articles appearing every year in the
planet’s 20,000 peer-reviewed journals (the full-text literature that
— as we cannot keep reminding ourselves often enough, apparently —
the open/free access movement is dedicated to freeing from access-tolls)
consists of articles only, *not* the research data on which the articles
are based.

Hence, today, the access problem concerns toll-access to the full-texts
of 2,000,000 articles published yearly, not access to the data on which
they are based (most of which are not yet archived online, let alone
published; and, when they *are* archived online, they are often already
publicly accessible toll-free!).

No doubt research practices will evolve toward making all data
accessible to would-be users, along with the articles reporting the
research findings. This is quite natural, and in line with researchers’
desire to maximize the use and hence the impact of their research. What
may happen is that journals will eventually include some or all the
underlying data as part of the peer-reviewed publication itself (there
may even be “peer-reviewed data”), but in an online digital supplement
only, rather than in the paper edition
.”

The following is an effort by the Open Knowledge Foundation to define diferent meanings of ‘open’:

(see the critical comment on this view by Steve Harnad below)

We take open to have three distinct senses: legal, social and technological.

Legally Open

Knowledge is legally open if it is free of most of the standard legal restrictions and requirements. In particular it should be accessible without restriction, reproducible freely (at least for non-commercial purposes), and reusable – that is, freely incorporatable in derivative works. In short, it should fall within the bounds of one of the Creative Commons licenses.

Socially Open

Social openness consists of ensuring that a work is made available and not kept secret or mouldering on a CD at the back of the drawer. It means supporting sharing and reuse as well as collaborative working processes.

But most importantly it means an ‘open source’ approach to knowledge. That is, knowledge should be made available so that access is given to the raw, underlying data and not simply through a particular, usually limiting, interface (such as a human-only-usable web form).

This parallels the distinction with software programs, emphasized by the term open source, between access to the underlying source code and access simply to the compile version. Thus Open Knowledge in this sense can stand for access to the underlying ‘source’ rather than purely access to the ‘compiled’ end product. To illustrate consider the following examples.

For data in a database the ‘source’ form means the raw data and the ‘compiled’ form is any of the multitude of interfaces such as web query pages that can wrap that data. Providing access to the source data would be a major change – even open databases that are freely searchable rarely provide their data in source form – the only form in which it is any use to a computer.

Another example is provided by the common practice of providing a PDF version of a document rather than the original text file. This, perhaps intentionally, hinders access to the underlying text and inhibits activities such as annotation or indexing.

Technologically Open

Technological openness requires that knowledge is provided in a form and format that does not unnecessarily hinder access to humans or machines. This can be achieved by utilizing data formats and tools that are open – meaning that a full specification is publicly available and unencumbered by legal restraints, and that access and use of the formats will not require proprietary tools or products (for more information on ‘openness’ of formats see the Information Accessibility Initiative).

It also means providing the necessary documentation, structuring and presentation of data so as to ensure comprehensibility and usability. One should aim to achieve these ends not just for humans but also for computers – something that is increasingly essential in an information age.”

2 Comments Steve Harnad on the difference between open access to code, to text, and to data

  1. AvatarStevan Harnad

    The above is a bit misleading and confusing because it does not make a clear break between where the quotation from me ends and the text of the author of the above posting (unidentified) re-starts:

    My quote ends at but in an online digital supplement
    only, rather than in the paper edition.�
    .

    Starting with The following is an effort by the Open Knowledge Foundation to define diferent meanings of ‘open’: , the words (and thoughts) are no longer mine. Indeed, the words and thoughts are exactly the ones I try to argue against in my own quote, because they conflate the OA and OKF sense of “open” and “free”!

    Stevan Harnad
    American Scientist Open Access Forum

  2. AvatarRufus Pollock

    First a point of clarification: ‘The Three Meanings of Open’ essay which was quoted above has now been ‘formalised’ into an Open Knowledge Definition which you can find at http://okd.okfn.org/.

    I would also point out that the Open Knowledge Definition is very similar to the definition of ‘open-access’ as found in the BBB (Berlin, Budapest, Bethseda) declarations though, perhaps, with more emphasis on reuse and modifiability (I have had substantial discussion about the OKD, and its relation to open-access, with Peter Suber who like, Stevan, is a prominent Open-Acess advocate). In particular, the OKD and the BBB definitions share the requirement that there must be freedom to access, redistribute and reuse a work.

    In addressing Stevan Harnad’s comments I would like start by distinguishing between a) providing a definition of ‘open’ (or ‘open-access’ or ‘free’ — note these may all mean different things) b) campaigning for a particular piece of knowledge to be ‘open’ (or ‘open-acess’ or …). Though the Open Knowledge Foundation — and Stevan Harnad — are engaged in both types of activity it is important to remember they are separate. Furthermore the OKF do not believe that **all** knowledge **must** be made open.

    The aim of the open knowledge definition (and the ‘Three Meanings’ article cited above) was to set out clearly what one meant by ‘open’ knowledge. It does not seek to mandate what should and should not be open knowledge and it does not, for example, require that for an academic article to be termed ‘open’ it must also make its data available — the definition focuses only on the work itself.

    Do journal articles need different treatment from other forms of knowledge? ‘Open-Access’ has traditionally focused on academic journal articles — this was one reason for creating a more general defintion of ‘open knowledge’ — and as Stevan points out journal articles may be differnet from other forms of knowledge such as raw experimental data. However I am not sure why this means you can’t use a common defintion of ‘open’. Even if reuse of journal articles is less frequent than for data it **does** happen so why not ensure that it is allowed?

Leave A Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.