In this series of posts I will discuss the evolution of machine learning algorithms with regards to scaling and performance. We will start with a naive implementation and progress to more advanced solutions finally reaching state of the art implementations, similar to what companies like Google, Netflix and others use for their data pipelines, recommendation systems or machine learning. A variety of topics will be discussed, from basics of ML, different programming models, impact of distributed environment, specifics of machine learning algorithms as compared to common business applications and much more. For those not particularly interested in machine learning the concepts discussed are chosen carefully to apply to a wide range of applications and ML itself is chosen as a good example.
Although very old concepts, the importance of big data analytics and machine learning is steadily increasing. One of the reasons is improving accessibility of tools, decreasing prices and therefore the ability to access, store, process and use large amounts of data. And data are key for many use cases, from optimizing standard business use cases to finding and opening new business opportunities to completely transforming businesses.
Throughout this series of blog posts we will touch on many topics from machine learning, functional programming, parallel programming to distributed systems theory. I will start with a brief introduction into the different programming models, followed by abstract description of single machine, parallel and distributed computation, common data processing architectures, pipelines and technology stacks before getting to the actual focus of the blog post. Feel free to skip to chapter Perceptron if you want.