Bioinformatics for Beginners

Bioinformatics for beginners

It's faster, easier, and more affordable with Illumina

analyzing NGS data and bioinformatics

Analyzing NGS data can be easy

Bioinformatics can feel intimidating to biologists. Until recently, analyzing the data from your sequencer required advanced experience in bioinformatics. Many laboratories relied on limited in-house or external bioinformaticians which resulted in lengthy turnaround times. Analyzing and interpreting NGS data continues to be the major inhibitor to insights for many labs. Fortunately, there are many easy-to-use, commercially available analysis and interpretation tools available making NGS data analysis accessible to biologists.

Getting started with bioinformatics

Evaluating bioinformatics solutions involves several considerations:

  • Integration and compatibility: Seamless integration with sequencing instruments, software and workflows reduces data transfer issues, streamlining the analysis and discovery process. Compatibility with industry standard formats and protocols ensures interoperability with other tools and resources.
  • Accuracy and innovation: Whether purchasing commercial software, or building your own informatics workflow, ensure the algorithms, databases, and pipelines are comprehensive and robust. With the rate of genomic discovery rapidly increasing, having the ability to easily update your workflow with the newest pipelines and innovations will ensure you consistently have high quality results. For commercial solutions, look for evidence of data accuracy though publications, customer testimonials, and independent evaluations.
  • Support for multiple applications and use cases: Evaluate the software’s ability to meet your specific research requirements. The ability to support applications across multiple areas of research, configurable workflows, data formats, and multiple deployment options are essential to address the unique challenges of your projects.
  • Scalability: Ensure the bioinformatics infrastructure can accommodate your current data volumes and scale seamlessly as your needs grow. A scalable platform is critical for timely analysis and processing large data sets.

For a deeper dive into bioinformatics, download the Gene Expression and Regulation eBook

Benefits of Illumina data analysis tools

Illumina offers intuitive, user-friendly software to easily connect your sequencer. For laboratories just getting started, Illumina provides offerings tailored to single sample reporting, biomarker discovery, and population research. Our tools can be set up to launch immediately after your sequencing run is complete, enabling you to walk away and return to automated insights or reports. Illumina provides easy-to-use visualization tools and the most accurate secondary analysis¹ available to help you get insights you can trust, faster. 

Award-winning accuracy

Our analysis software sets new standards for data accuracy, winning industry challenges for the highest precision and overall accuracy as shown in the PrecisionFDA truth challenge.1 When combined with Illumina sequencers, the Illumina Genome is the most accurate on the market.

Comprehensive

Expanding suite of digital solutions powering multiple use cases and applications for research and clinical laboratories.

Connected end-to-end workflows

Illumina Connected Software offers integrated solutions for every step of the NGS workflow, from lab and sample management through interpretation. With reduced manual touchpoints, you'll get deeper insights faster.

Easy to use

Perform intuitive, guided analysis leveraging a curated and comprehensive menu of point-and-click analysis applications and a user-friendly graphical interface.

Secure and shareable

Security and privacy is at the core of Illumina software. Both enterprise cloud and on-premises solutions are built with global and regional regulations in mind. When it comes to sensitive genomic data, Illumina is your trusted partner.

Scales with your research

Our solutions help researchers scale as they grow, from supporting single-sample analysis to population-wide analysis. Researchers can interpret single samples or aggregate data from multiple sources to understand genetic trends and make population-scale discoveries.

"BaseSpace Sequence Hub enables us to analyze, store, and disseminate data without the need of a bioinformatics staff or a server. Without BaseSpace, it would have taken us longer and would have cost more to get this level of data output and operational efficiency."

Featured Illumina software
Featured software tools Study types and scale Bioinformatics experience
Clarity LIMS
Laboratory information management system for automated sample tracking, workflow and data management.

Regulated labs looking to scale

Beginners to experts

BaseSpace Sequence Hub
Simplified run management, monitoring, and bioinformatics analysis.

Small-scale studies: discovery research

Beginners

Illumina Connected Analytics
Powerful bioinformatics software suite that offers highly accurate, comprehensive and ultra-efficient secondary analysis of NGS data from Illumina sequencing systems.

Large-scale studies such as population studies or clinical research

Intermediate to experts

DRAGEN Secondary Analysis
The DRAGEN platform is a secondary analysis software suite that processes NGS data and enables tertiary analysis to drive insights.

Small-scale to large-scale discovery research, clinical research, and population studies

Beginners to experts

Emedgene
Automated insights solution with AI-prioritization that can streamline dry lab workflows for WGS, WES, virtual panels, and targeted panels.

Supporting small and large labs seeking an operationalized workflow for sample analysis for germline studies

Beginners to experts

Illumina Connected Insights
Comprehensive insights and automation to support variant interpretation for diverse applications and variant types at scale.

Supporting small and large labs seeking an operationalized workflow for sample analysis for oncology studies

Beginners to experts

Software analysis pipelines for popular applications using low- and mid-throughput instruments
Featured DRAGEN pipelines and software tools Popular applications Recommended instruments
DRAGEN RNA
Performs mapping and alignment of RNA reads, RNA quatification, gene fusion detection, and small variant calling.

Gene expression profiling

Differential expression analysis

Biomarker discovery

Low- and mid-throughput benchtop sequencers. NextSeq 1000 & 2000 feature onboard data analysis.
MiSeq System
NextSeq 1000 & 2000 Systems

DRAGEN Targeted Microbial
Analyzes Illumina Viral enrichment panels or tiled amplicon kits (COVIDSeq, IMAP) with a few easy clicks on BaseSpace. Provides best match/identification with consensus genomes and coverage plots.

Microbial/Viral Sequencing

Low-throughput benchtop sequencers.
iSeq 100
MiniSeq
MiSeq System*

DRAGEN Enrichment
The DRAGEN Enrichment Pipeline combines DRAGEN’s germline and somatic callers into a pipeline designed specifically for analyzing enrichment samples. Includes a full suite of enrichment metrics and reporting.

Exome sequencing

Mid-throughput benchtop sequencers with onboard data analysis.
NextSeq 1000 & 2000 Systems

Partek Flow
Approachable mulitomic analysis solution with an easy-to-use interface, robust statistical algorithms, information-rich visualizations, and genomic tools enabling researchers of all skill levels to confidently perform data analysis.

Multiomics

Mid-throughput benchtop sequencers with onboard data analysis.
NextSeq 1000 & 2000 Systems

Correlation Engine
Correlation Engine is an interactive omics knowledgebase that puts private omics data in biological context with highly-curated public data.

Multiomics

Mid-throughput benchtop sequencers with onboard data analysis.
NextSeq 1000 & 2000 Systems

*Onboard data analysis offerings vary by instrument.

Bioinformatics FAQ

Establishing and maintaining NGS data analysis software requires substantial expertise and effort that many forget to include in cost calculations. These costs are often significant and surpass the price of a software license or subscription.
High quality commercial software has teams of experts continually improving, testing, and updating pipelines for the most accurate analysis available. Additionally, commercially maintained software with clear documentation and access to a dedicated support team offers assistance when you need it most. Combined with features like a graphical user interface (GUI), these abstract away the complexities of bioinformatics so you can access consistent results quickly.
After libraries have been prepared and sequenced, Real-Time Analysis (RTA) software onboard the sequencing system provides base calls and associated quality scores.

Data coming off the sequencer is then processed through secondary analysis, where sequencing reads are aligned and assembled, to provide the full sequence for a sample and either provide DNA variant calls or RNA transcript counts.

The process of generating DNA variant calls typically reveals thousands to millions of variants, which are then interpreted in the final step of an NGS workflow, known as tertiary analysis, to derive biological insights. Tertiary analysis, or variant interpretation, enables users to ingest variant call files (VCF) and perform downstream analyses depending on the application, including gene expression/quantification profiles, heatmaps, visualizations for biomarker discovery, or single sample reports for research purposes
Data analysis costs may vary depending on the choice of software in addition to infrastructure and deployment options selected. For example, using open-source software often does not require a license or subscription, but may incur higher cost on computing, either on cloud or on-premises, and on resources associated with development and maintenance of pipelines. Commercial analysis software requires purchase of software licenses and/or subscriptions, often accompanied by higher accuracy, faster runtimes, reduced compute, and better support.
Illumina DRAGEN secondary analysis offers the most accurate and most comprehensive secondary analysis for Illumina sequencing data1. The analysis of each 30-35x human whole genome may typically cost $6-10 on the cloud (Illumina BaseSpace Sequence Hub or Illumina Connected Analytics) and could cost more in cases with compute-intensive pipeline configurations. The cost is approximately $1.5-2.5 for each human whole exome or transcriptome sample. For projects with a lot of data, DRAGEN ORA lossless compression (available on DRAGEN server and DRAGEN onboard instruments) can reduce cost of data transfer and storage costs by up to 80%1.

For customers that choose to analyze and store their data in the cloud, Illumina offers clear and transparent access to account information to track and monitor compute and storage costs. For organizations that keep data stored over time, archiving data you don’t plan on accessing for many months or longer and deleting old data will also reduce the cost of data storage.

1Data on file, Illumina Inc., 2024
A LIMS, or Laboratory Information Management System, is a tool used to effectively manage laboratory and sample data. For genomics labs, a genomics-specific LIMS is particularly important to manage the volume and complexity of sequencing data. The more samples your lab runs, the more important it is to use a genomics LIMS for accurate, efficient, and compliant wet lab data management. Implementing a LIMS can profoundly impact downstream analysis by enhancing data integrity through workflow standardization, automation, and accessibility. It helps laboratories maintain accurate records, streamline workflows, ensure quality control, and improve collaboration among teams.

Learn more about what a LIMS can do for your lab here.
Yes. Illumina offers a range of training options for customers. Once selecting the software solution best fit for your application areas and test menu, Illumina will scope your implementation and training needs. Integrated as part of Illumina’s software implementation packages, training is available to walk-through consulting, platform and filters set-up, data ingress set-up, project management, report customization, and other key software functionalities. Customers may also access the Illumina support site to learn best practices and latest techniques via online courses and instructor-led trainings.
Data security and privacy are essential components to consider before setting up your NGS experiment. Only you can know best whether to store your data locally or in the cloud; Wherever you choose, Illumina offers software with both deployment options available. Some factors to consider include if your region or organization has data storage requirements, or stringent compliance policies. If choosing to store your data locally, it’s important to consider how your organization will protect and secure the data. Depending on the size of your project and the volume of data, most organizations will find they need dedicated facilities and storage, which requires initial capital investment.

Storing data on enterprise clouds like Amazon Web Services (AWS) or Google Cloud Provider (GCP), provides access to a dedicated teams of cloud security and privacy experts to ensure that the data stored and accessed within the cloud environment are encrypted, private, and secure. With data already in the cloud, secure collaboration with colleagues is also streamlined.

For more information, visit Genomic Data Storage & Security.
Informatics resources
customer video Making informatics easy for Helix
Making informatics easy for Helix

Hear from William Lee, Vice President of Bioinformatics at Helix and Jessica Gordon, Director of Software Engineering at Illumina as they explain how Helix was able to scale rapidly into one of the largest human exome sequencing operations in the world using BaseSpace Sequence Hub.

customer video The role of informatics and genomics in drug discovery
The role of informatics and genomics in drug discovery

Slavé Petrovski, head of AstraZeneca's Centre for Genomics Research, Discovery Sciences, R&D spoke with Illumina to discuss the role of informatics and genomics in drug discovery.

DRAGEN variant calling
Accuracy improvements in germline small variant calling with DRAGEN

This application note describes recent advancements in the accuracy of germline small variant calling by the DRAGEN platform, as measured against a truth set.

case study
Streamlined workflow management enables sequencing at scale

Learn how the implementation of BaseSpace Clarity LIMS software enables Rapid Novor to future-proof their growing laboratory operations.

Have additional bioinformatics questions?

We’d love to help. Reach out and one of our specialists will be happy to answer your bioinformatics and data analysis questions.

Speak to a specialist

References
  1. Illumina DRAGEN Secondary Analysis is the first single platform to achieve 99.83% accuracy based on PrecisionFDA v2 Truth Challenge Benchmark Data. Details here.