Running scientific workflows: public cloud vs on-premise

In recent years, there has been a growing trend among scientific researchers and institutions to move their scientific workflows to public cloud platforms. This shift has been driven by a number of factors, including the increased availability of cloud computing resources that can be easily scaled up or down as…

Using Apache Airflow to monitor data pipelines

Apache Airflow is a popular open-source platform for developing, scheduling, and monitoring workflows. Airflow is developed in Python and enables the development of batch-oriented workflows, that are dynamic, extensible and flexible, as they are configured as Python code. Airflow provides a rich interactive web user interface (UI) that helps manage…

Exploring Nextflow with a small bioinformatics workflow

I recently wrote about workflow management systems in bioinformatics, focusing on Nextflow and Snakemake. In this post, the aim is to compose a small Bioinformatics workflow to start exploring the Nextflow syntax and its features. The Nextflow documentation is extensive and provides many examples that are a helpful start. Nevertheless,…

Workflow management systems in Bioinformatics

Several alternative scientific workflow systems typically referred to as workflow management platforms are used to run data pipelines (e.g. SNP calling and performing ETL). Among the most popular workflow managers are Nextflow and Snakemake. These were conceived in Bioinformatics labs but essentially try to address similar reproducibility and scalability issues…