As we move from a microarray-focused era to one in which NGS technologies play an important role, we have the opportunity to learn a point we may have missed the first time around – that far more important than the platform we choose may be the way in which we choose to make sense of the generated data.
By Heidi Bullock
There has been a significant amount of buzz this last year over Next-Gen Sequencing and the future state of microarrays. Anthony Fejes made a bold statement on his Nature blog post: microarrays are dead. Are they? How should we be thinking about defining successful research as these two different platforms start to overlap?
Microarrays enable bench scientists to move away from a reductionist approach to their experimental questions, to take a more global view of their experimental model. This is actually a good thing. But it doesn’t feel like a good idea once you get back that massive spreadsheet of genes and their expression changes. This highlights a particular challenge in genomics research. For all the years that microarrays have been the go-to platform of choice (and still are for many), researchers haven’t always been clear on how to best leverage the generated data. Though they don’t produce quite the same deluge of data as NGS experiments, microarrays are no wimps either. Many of us have stared at long (ok, really long) Excel spreadsheets with columns and columns of never-ending data.
Faced with such a daunting challenge, it’s really hard to fight that reductionist tendency to just find the old familiar genes that you know, or the ones that are most up/down regulated. Many researchers attempt to get past this by using statistical packages to narrow in on mathematically interesting sets of genes, but fewer make the most important leap of all – a deep biological analysis that addresses the key question: what the heck does this data REALLY mean, and how does this help me answer my research question?
While stats are necessary, they clearly only get you so far. It is critical to examine genes in the context of a bigger picture – living, breathing biology. For example, looking at a visual representation of a complex system provides a rapid and familiar biological orientation. By examining a gene of interest in the context of a pathway, it becomes easier to get a sense of what is happening in your experimental model. Who are the key players? What are the top pathways involved in the data set? Tools that can support this kind of biological analysis enable you to efficiently explore and visualize your data in the context of published research, and provide a deeper level of understanding. Here, your research is occurring in a known world that makes sense, one in which you trust the information being provided and one in which you can easily navigate your way around to find interesting paths and connections. Once you reach this level of confident exploration (making decisions based on more than just stats), your research can accelerate to a whole new level. You start to transform basic analysis results into deeply useful research outcomes: well formed hypotheses and novel biological findings that can be cited in papers and supported by reviewers.
With the excitement around Next-Gen Sequencing and its promise of cheaper, better, faster, we also have an exciting second chance to not miss the real point – one which may have missed the first time around: the generation of data is not the hard part, nor is it where the majority of our time should be spent. Access to different technologies is not the key thing that will affect research outcomes. The real challenge (regardless of platform) will be ensuring that whatever data is generated can be usefully interpreted and effectively used to make better research decisions. The Galaxy conference last week in the Netherlands is a great example of strong work being done in the open source community to further define and accelerate how NGS data should be transferred, processed, and ultimately used. Newly announced links between Galaxy and tools like IPA indicates a growing awareness that biological analysis is a necessary step in approach NGS data.
At a crossroads in technology, there is a great opportunity to learn from the microarray era what could be done better. The fact that microarrays are so standard now that some consider them “dead”, and the fact that NGS is coming at us like a freight train, both point toward the fact that access to a genome-wide experimental platform is no longer the thing that will differentiate which projects make that successful conceptual leap and accelerate their research. Generating high quality data will be easy, cheap, and accessible to all – microarrays, NGS, pick your platform – just depends on the question you are asking. That key lesson is that the ability to understand that data – in all its complexity, by interpreting the data in a meaningful biological context, and identifying solid avenues to follow up on –will be the ultimate predictor of truly successful research in the “NGS era” – even if that research is using microarrays.