Last month, a few former colleagues at LCAV did some cross-testing of the reproducible research compendia available at rr.epfl.ch. And I must say, from the results I have seen so far, it is quite a sobering experience. Many of those which I considered to be definitely reproducible didn’t pass the test (entirely). I guess that shows again how difficult it is to make work really reproducible, even if you fully intend to do it. So that also leads me to my conviction that for papers that do not have code and data online, it is almost impossible to reproduce the exact results. There is work to be done on the road to reproducible research!
I’ll need to look further into the reasons why even some of my own work did not pass the test.
I am glad to announce you our new website on reproducible research: www.reproducibleresearch.net. Yes, as I already discussed before, various sites on this topic recently (or less recently) popped up. However, I still think this site can add something extra to the existing sites. First of all, it is mainly addressing the signal/image processing community, a research domain not specifically addressed in the other sites yet.
It contains information on reproducible research and how to make signal processing research reproducible. It also lists references to articles about reproducible research, a discussion forum, and various other related links.
And then, in my opinion an important extra to signal processing interested people. We added a listing of links to papers for which code/data are available (with of course links to them). I really believe this can be extremely useful when doing research. For copyright reasons, we cannot (in most cases) host the PDF on our own site, and I am also not sure we should want to. But if developed and maintained well, this can give a one-stop site when looking for code/data related to a paper. So please feel free to send me your additions. I will be happy to add all signal/image processing related works!
I’m really excited about this site, so let me know what you think!
The current issue of Computing in Science and Engineering (CiSE) is a special issue on reproducible research, edited by two pioneers in the field: Jon Claerbout and Sergey Fomel. They have assembled a great set of articles from experts with a lot of first-hand, personal reproducible research experience, so I would highly recommend this to my colleague researchers!
I got a pointer earlier this week to a New York Times article about R. A very interesting article about the use of R in scientific communities and industrial research, mainly for statistical analysis. R is open source software, so it is free and has already taken advantage from contributions made by various authors. And (although I haven’t used it myself yet), it is a great tool for reproducible research. Using the package Sweave, authors can write a single document containing their article and the R code to reproduce the results and put them in place. This ensures that all the material is in a single place.
It also shows something about the amazing power of open source software developed by a community of authors (and typically users at the same time).
I seem to be dwelling quite some time on the web lately… After my post about the lifetime of URLs, here’s one about domain names and reproducibility. I recently noticed when looking around that there are quite some websites and domain names related to reproducible research.
reproducibleresearch.org is an overview website by John D. Cook containing links to reproducible research projects, articles about the topics, and relevant tools. It also contains a blog about reproducible ideas.
reproducibleresearch.com is owned by the people at Blue Reference, who created Inference for Office, a commercial tool to perform reproducible research from within Microsoft Office.
reproducibility.org is used by Sergey Fomel and his colleagues as home for their Madagascar open source package for reproducible research experiments.
reproducible.org is a reproducible research archive maintained by R. Peng at Johns Hopkins School, where the goal is to host a place for reproducible research packages.
Quite a range of domain names containing the word “reproducible” (or a derivative), if you ask me! And then I didn’t even start about the Open Research or Research 2.0 sites. Let’s hope this also means that research itself will soon see a big boost in reproducibility!
Let me in my turn wish you all the best for 2009! I wish you a beautiful, entirely non-reproducible year with lots of great experiences!
2008 was the year in which this site got started, and to be honest, I am quite happy with the frequency at which I managed to post articles here. In its first year, the site also obtained a reasonably good visibility on Google, so nothing to complain about. It does remain to a large extent a one-way communication, but as I hear from colleague bloggers, that is not uncommon. Let me at the start of this year invite you again: if you read this blog, and like or dislike something I write, please post a comment! It will encourage me to continue writing, and make me feel a bit less lost in blogosphere.
And up to a wonderful 2009 now!
I am getting worried these days about the volatility of URLs and web pages. I guess you all know the problem: it is very easy to create a web page, and hence many people do so. Great! However, after some years, only few of those web pages are still available. Common reasons include people retiring, or moving to other places, and therefore their web pages at their employer’s site disappear. Similarly, registering a domain name at some point in time does not mean you will keep on paying the yearly fees forever. Or also, web sites getting an entire re-design often result in broken URLs.
Why does this worry me so much?
Last week, I attended the Berlin 6 Open Access Conference in Düsseldorf (Germany). It was an interesting conference, on different aspects of Open Access: making publications freely available online. There was a wide variety of talks, from publishers’ perspectives over financial models for Open Access and open standards, to benefits of Open Access for developed and developing countries.
One of the sessions was organized by Mark Liberman around the topic of reproducible research. I gave a talk there about my experiences with reproducible research, but that’s not what I want to talk about here. I found it very interesting to see the wide range of subjects and perspectives that Mark gathered in that session. Slides of the entire session are available here for those who are interested.
Reproducible research, literate programming, open science, and science 2.0. All different namings, and (in my opinion) all covering largely the same topic: sharing code and/or data complementing a publication as a presentation of your research work. While literate programming is more focused on adding documentation to code, and science 2.0 seems to include the assumption that you put work in progress online, there really seems to be a very large intersection between these topics.
This clearly shows that from various sides of the scientific community, in very different fields of science, the same ideas pop up. That is a really exciting thing! And at the same time it also shows that there is a clear need for such open publication of a piece of research. And I think everyone will agree that there would be nothing nicer than being able to really start from the current state-of-the-art when starting to do research in a certain field?
Should all these efforts be merged under a single “label”? It would definitely be exciting. And it would create a huge impact, as a joint effort for “open science”, “reproducible research”, or whatever the name may be, would receive a lot of attention, and cannot be overlooked by anyone anymore. At the same time, every research domain needs other specifics or finetuning, and it is not clear to me now what the “best” setup would be for the type of work I am doing now. So maybe we should let these variations co-exist for some more time, and see later which ones survive, are the simplest to use, and which tools can be combined to create an optimal method for research.
But of course (if anyone is reading these posts), I would be very happy to hear your own opinion on this!
A few months ago, I read in a Belgian newspaper that 9% of the participants in a study among 2.000 American scientists said they had witnessed scientific fraud within the past three years. And it seems they were not talking about those cases where people use Photoshop to crop an image or so, but rather inventing fake results or falsifying articles.
Although I wasn’t able to find this back on the web with Google, I am quite sure the original authors checked the number. Wikipedia reports on another study, where the actual number was 3%. Anyhow, whether it is 3 or 9 percent, this number is much too high. Let us hope it can be taken down by requiring higher reproducibility of our research work. I do realize that there will always be people cheating, and falsifying results (Wikipedia even keeps a list of the most famous cases). But I also strongly believe that in the end, most researchers just want to do good work. And many of them perform non-reproducible work, just because they don’t feel the need for making it reproducible (yet). Or are too busy with their next piece of work to properly finish off the current one…