[Disclaimer] I have not worked in academia. This post is part of my process for organizing my thoughts as I learn about Open Science. If you have first-hand experience about these problems and would like to add more context or correct my misunderstandings, please share!
[add grammar pass]
Scientific publishing puts publicly-funded research behind private gatekeepers
Here’s how most scientific knowledge is created and shared right now: University research groups receive funding from government grants, they conduct research, and then publish the results in privately run journals (think Nature, Science). Once the article is published, anyone who wants to read the results (including the researchers that created the knowledge in the first place) have to pay those journals large subscription fees. The average person can’t get access to any of this publicly funded research because it’s owned by private companies and behind a paywall.
In the age of the internet, Wikipedia, GitHub, and open source software, and particularly to those of us working the technology, this treatment of knowledge feels wrong. Aaron Swartz famously tried to download JSTOR journal papers and make them available on the internet, leading to aggressive prosecution and his suicide. The problem is so bad that Harvard University Library complained in 2012 that it couldn’t afford to keep buying all the subscriptions anymore. Imagine what less well-funded institutions are facing.
There is an Open Science movement to make the process of research more open and make the results of the research more accessible. arXiv is a well-known example and there are other new Open Access journals that allow their contents to be freely redistributed.
Publish or Perish
Maybe this is just a problem of education. If we told more scientists about the current problems, they’ll start publishing their new research with Open Access journals. Unfortunately, researchers have a strong personal incentive to keep publishing with the current well known crop of journals – their career depends on it. Advancement, grants, and tenure, all the things that research scientists care about are heavily based on publication records. The goal is to publish as much as possible, in as prestigious a journal as possible. If these new Open Access journals aren’t considered prestigious in their research community, the best researchers won’t publish to it. In a sense, we would have to ask researchers to choose between advancing their careers, or upholding their ideals of making their work available to everyone.
What do the journals have to say about why they charge such high fees? The most common response is that they provide a valuable service in the form of peer review. Peer review has been established as a key part of the scientific process. Prospective papers are shared with other researchers in the field who provide feedback, comments, and validation. The journals essentially crowd-source the authentication of the research back to the research community. Oh, those peer reviewers? They’re unpaid. The service that journals directly provide is the infrastructure of soliciting and processing the feedback. This model actually make a lot of sense, it’s the community of the most informed that curates the quality of the content. The part that doesn’t make sense is the cost of the service.
Github for Science
This is actually a market rich enough to support the business case for a disruptive startup. The publishing market for STEM research is $19B. 42% of all articles are by published by the top three publishers: Reed Elsevier, Springer, and Wiley.
[more stuff here]
One of the most interesting startups tackling this problem is Authorea. Their end goal is to make the underlying data more available, and they’re starting with improving the writing and editing process. They provide tools to embed data into papers, and tools to collaborate with others to edit papers. If it’s easy to make your data available while you’re writing the paper, maybe it’s not a big step to sharing it after you publish it. This is a smart way to get into the space by providing a service that researchers can see immediate value in (less painful process of writing technical papers).
Startup Idea
I think for a startup to actually make headway on the primary problem, they need to create a substitute for the value that journal papers provide right now: validation of research, distribution for research, and career prestige.
One way is to co-opt the current journal system by building on top of it. Let the current journal keep publishing the papers and keeping it behind their paywall. But create a better tool for discovering and discussing the research. Create something like RapGenius that has a page for each paper, linking to where the full text can be found. For articles that can be republished, make it available. Even for ones that can’t, allow researchers to annotate or comment on the research. Allow the actual writers of the paper host their data. Create trackbacks and other linking mechanisms connecting the bibliography. One of the ways papers become important is through how many other cite them.
[more stuff here]
This would move the center of gravity to a publicly viewable site. Once this community is built, the distribution and prestige problems would be solved. From there it’s a short step to offering peer review this new community.
Further Readings
I started looking into this when reading Everything is Bullshit, but have tried to refer to the primary sources as much as possible in this post. Here are some other articles I found that were good, that didn’t belong anywhere in the body:
- http://marciovm.com/i-want-a-github-of-science/
- http://michaelnielsen.org/blog/the-future-of-science-2/
- https://github.com/blog/1840-improving-github-for-science