Glow

Built to scale

Glow makes genomic data work with Spark SQL, the leading engine for working with large structured datasets. It fits natively into the ecosystem of tools that have enabled thousands of organizations to scale their workflows to petabytes of data.

Flexible

Glow works with datasets in common file formats like VCF or BGEN as well as common big data standards. You can write queries using the native Spark SQL APIs in Python, SQL, R, Java, and Scala. The same APIs allow you to bring your genomic data together with other datasets like electronic health records, real world evidence, and medical images. Glow makes it easy to parallelize existing tools and libraries implemented as command line tools or Pandas functions.

Easy to get started

If you’ve used Spark before, you don’t need to learn any new APIs to get started with Glow. The toolkit includes the building blocks that you need to perform the most common analyses right away:

Datasources for loading VCF and BGEN files into Spark DataFrames
Functions for performing quality control and data manipulation
Variant normalization and lift over
Regression functions
Integration with Spark ML libraries for population stratification
Utilities for piping DataFrames through command line tools

An open-source toolkit for large-scale genomic analysis

About Glow

Built to scale

Flexible

Easy to get started

Contributors

Keep in Touch: