Posts
Ben Postance
Cancel

Writing Pythonic Airflow DAGs with TaskFlow API

This post is a brief introduction to Airflow TaskFlow to write more Pythonic DAGs. The Airflow TaskFlow Tutorial Background I’ve been using Airflow at work and personal projects for a number of ye...

Multi-tasking in Python

In computing systems, multitasking is the concurrent execution of tasks and processes. We divide up our resources to work on more than one task at the same time. In data engineering, analytics, an...

Geospatial Analysis: obtaining and pre-processing OpenSource satellite data

This notebook demonstrates how to obtain and pre-process satellite data from from NASA LP DAAC - Land Processes Distributed Active Archive Center. I will show how to process LP DAAC datasets into c...

A guide to clustering large datasets with mixed data-types [updated]

Jupyter notebook here A guide to clustering large datasets with mixed data-types Pre-note If you are an early stage or aspiring data analyst, data scientist, or just love working with numbers c...

Bayesian Inference with PyMC3: pt 2 making predictions

Jupyter notebook here In this post I will show how Bayesian inference is applied to train a model and make predictions on out-of-sample test data. For this, we will build two models using a case s...

Bayesian Inference with PyMC3: pt 1 posterior distributions

Jupyter notebook here Introduction Here we use PyMC3 on two Bayesian inference case studies: coin-toss and Insurance Claim occurrence. My last post was an introduction to Baye’s theorem and Ba...

Bayesian Inference by hand

image source Jupyter notebook here Bayesian inference is a statistical method used to update one’s beliefs about a process or system upon observing data. It has wide reaching applications from...

The finCEN files: Uncovering money laundering patterns in the global banking network

Jupyter notebook here The Financial Crimes Enforcement Network (finCEN) files are more than 2,500 documents, most of which were suspicious activity reports (SARs) files that banks sent to the US ...

Geospatial Analysis: Working with MODIS data

Jupyter notebook here The Moderate Resolution Imaging Spectroradiometer (MODIS) is an imaging sensor built by Santa Barbara Remote Sensing that was launched into Earth orbit by NASA in 1999 on b...

Portfolio Optimisation: Monte Carlo method

Jupyter notebook here Given a fixed amount of avliable resources, optimise allocation to maximise returns across a set of products with variable returns. Following the Modern Portfolio Theory m...