Coding best practices for researchers

I have recently been invited to deliver a training session on an upcoming course at EMBL-EBI. The session is about good coding practices and is aimed at researchers and clinicians that have some experience working in Computational Biology and data analysis. The subject is vast as it touches on many…

Running scientific workflows: public cloud vs on-premise

In recent years, there has been a growing trend among scientific researchers and institutions to move their scientific workflows to public cloud platforms. This shift has been driven by a number of factors, including the increased availability of cloud computing resources that can be easily scaled up or down as…

Data validation with JSON Schema

JSON Schema is a declarative language that allows annotation and validation of JSON documents. The benefits of JSON Schema are that it can fully describe existing data formats, and it provides clear human and machine-readable documentation. JSON Schema enables the confident and reliable use of the JSON data format. The…

Building user-friendly CLIs with Click

In software development, we use command line interface (CLI) applications all the time, for example, to install software packages or to test our code. Creating CLIs, is a skill that we need to learn sooner or later. Often times we spend a lot of time thinking about the functionality of…

Using Apache Airflow to monitor data pipelines

Apache Airflow is a popular open-source platform for developing, scheduling, and monitoring workflows. Airflow is developed in Python and enables the development of batch-oriented workflows, that are dynamic, extensible and flexible, as they are configured as Python code. Airflow provides a rich interactive web user interface (UI) that helps manage…

Exploring Nextflow with a small bioinformatics workflow

I recently wrote about workflow management systems in bioinformatics, focusing on Nextflow and Snakemake. In this post, the aim is to compose a small Bioinformatics workflow to start exploring the Nextflow syntax and its features. The Nextflow documentation is extensive and provides many examples that are a helpful start. Nevertheless,…

Workflow management systems in Bioinformatics

Several alternative scientific workflow systems typically referred to as workflow management platforms are used to run data pipelines (e.g. SNP calling and performing ETL). Among the most popular workflow managers are Nextflow and Snakemake. These were conceived in Bioinformatics labs but essentially try to address similar reproducibility and scalability issues…