Benchmarking Project Shows CLC Genomics Server Setup for HighSeq X Ten Halves Number of Compute Nodes Needed

CLC Genomics Server

CLC Genomics Server

For many of us who spend much of our time interpreting sequencing data for novel biological insights, you maybe interested in hearing about some of the dramatic improvements being made on upstream data processing and analysis before the data is ready for interpretation.  These improvements lower the cost, increase data through-put and overall improve the accuracy of the data we receive for interpretation.  Our QIAGEN Bioinformatics colleague, Mikael Flensborg, Director of Global Partner Relations at CLC bio, recently acquired by QIAGEN, wrote an interesting post for the Intel Health & Life Science blog on a benchmarking study they recently performed using publically available HiSeq X Ten data.

The $1,000 genome sequence has generated a lot of excitement but as a community we are still tackling the cost of processing and analyzing next-generation sequencing data.  This benchmarking study is important because it shows how innovation driven by our colleagues at CLC bio has essentially halved the number of compute nodes originally specified for supporting a HighSeq X Ten setup paving the way to lower the costs of data analysis for labs working with this exciting sequencing platform.

According to Illumina’s “HiSeq X Ten Lab Setup and Site Prep Guide (15050093 E)”, the requirements for data analysis are specified to be a compute cluster with 134 compute nodes (16 CPU cores @ 2.0 GHz, 128 GB of memory, 6 x 1 terabyte (TB) hard drives) based on an analysis pipeline consisting of the tools BWA+GATK.

This benchmarking study was based on a workflow (Trim, QC for sequencing reads, Read Mapping to Reference, Indels and Structural Variants, Local Re-alignment, Low Frequency Variant Detection, QC for Read Mapping) of tools on CLC Genomics Server running on a compute cluster with Intel® Lustre® filesystem, InfiniBand®, Intel® Xeon® Processor E5-2697 v3 @ 2.60GHz, 14 CPU cores, 64GB of memory, SSD DC S3500 Series 800GB.

Based on these specifications, they were able to create a compute cluster infrastructure using just 61 compute nodes, less than half the 134 recommended by Illumina.  Congratulations!

QIAGEN Bioinformatics is presenting these results at Super Computing 14 ( in New Orleans next week at the Enterprise Community Hub Session on Tuesday Nov. 18 from 3PM-4PM in the INTEL booth area. They will also have a Community Hub Session about Cancer Research on Wednesday Nov. 19 from 3PM-4PM (INTEL booth area) and a theatre presentation about Cancer Research tools on Tuesday Nov. 18 at 2:30PM at the INTEL Theater in the exhibition area.