ZeRO: Zero Redundancy Optimizer Explained

Intro The Microsoft team of Rajbhandari, Rasley, Ruwase, and He introduced a major breakthrough in efficient deep learning training in their seminal “ZeRO: Memory Optimizations Toward Training Trillion Parameter Models” paper. By developing the Zero Redundancy Optimizer (ZeRO) and associated techniques, they trained 8x bigger models and 10x better performance over state-of-the-art, as mentioned in… Read More »

SKIing with Y*, Iota and Ackermann

(Source code for this post is here: ) There are only six atomic operations for Turing machines – read and write to the tape, move the tape left or right one square,  change state based on the observed symbol and halt. Yet, it’s not hard to see, at least at a high level, how a program in… Read More »

Many Ways to Cluster a Dataset

For the demo scroll to the bottom of this article and click on the screenshot. Clustering is considered to be one of the simpler techniques in machine learning, and often discussion is centered around one algorithm only – K-means, because it’s straightforward and is easy to implement. However digging deeper we can find an amazing… Read More »

Bash the Pipes

Bash shell is often seen as simple launcher for other programs, when in fact it is fairly sophisticated piece of software. It has its own language, its own start-up scripts (several of them, executed depending on context) and its own job management service.   In this post I explore the bash redirects, pipes and their… Read More »