<img height="1" width="1" src="https://www.facebook.com/tr?id=1076094119157733&amp;ev=PageView &amp;noscript=1">

Posted by Martin Zapletal
Sun, Nov 20, 2016

Introduction

In this series of posts I will discuss the evolution of machine learning algorithms with regards to scaling and performance. We will start with a naive implementation and progress to more advanced solutions finally reaching state of the art implementations, similar to what companies like Google, Netflix and others use for their data pipelines, recommendation systems or machine learning. A variety of topics will be discussed, from basics of ML, different programming models, impact of distributed environment, specifics of machine learning algorithms as compared to common business applications and much more. For those not particularly interested in machine learning the concepts discussed are chosen carefully to apply to a wide range of applications and ML itself is chosen as a good example.

In my previous blog post we looked into neural networks, their training and investigated a trivial single threaded object oriented implementation. The result was a working example that was, however, not useful in many real world scenarios for its poor performance. With large amounts of data such approach is extremely wasteful and we can achieve vastly better performance through parallelization.

Posted by Martin Zapletal
Sat, Oct 1, 2016

Introduction

In this series of posts I will discuss the evolution of machine learning algorithms with regards to scaling and performance. We will start with a naive implementation and progress to more advanced solutions finally reaching state of the art implementations, similar to what companies like Google, Netflix and others use for their data pipelines, recommendation systems or machine learning. A variety of topics will be discussed, from basics of ML, different programming models, impact of distributed environment, specifics of machine learning algorithms as compared to common business applications and much more. For those not particularly interested in machine learning the concepts discussed are chosen carefully to apply to a wide range of applications and ML itself is chosen as a good example.

Although very old concepts, the importance of big data analytics and machine learning is steadily increasing. One of the reasons is improving accessibility of tools, decreasing prices and therefore the ability to access, store, process and use large amounts of data. And data are key for many use cases, from optimizing standard business use cases to finding and opening new business opportunities to completely transforming businesses.

Throughout this series of blog posts we will touch on many topics from machine learning, functional programming, parallel programming to distributed systems theory. I will start with a brief introduction into the different programming models, followed by abstract description of single machine, parallel and distributed computation, common data processing architectures, pipelines and technology stacks before getting to the actual focus of the blog post. Feel free to skip to chapter Perceptron if you want.

Posted by Peter Evison
Mon, Jun 27, 2016

Scala Days 2016 in Europe is over for another year, but what an incredible three days! One thousand people attended the conference in Berlin, which, as last time was a perfect host. As always Cake had a strong presence, in terms of talks (see below) and the Cake stand, supported by 12 members of the Cake team. Oh and we must not forget the great coffee our friends from Noble Espresso provided the attendees!

If you missed Scala Days then do not fear, we will be appearing at a number of other conferences in the UK, Europe and North America. We will be attending  Scala World for example, which was a great success last year; the feedback I received from our engineering team was fantastic!  We encourage you all to attend this year in September, details can be found here

We are the main sponsor for the Reactive Summit  in Austin (TX) which promises to bring together the world of reactive programming.

We will also be sponsoring the Scala eXchange in London in December which promises to be Europe's largest Scala Conference!

Posted by Peter Evison
Thu, Jun 2, 2016

Well as we wave goodbye to Scala Days NYC and prepare for Scala Days Berlin its time for us to share our thoughts and look back at the event. What better way to do that than a short video documentary! I hope you enjoy…

Posted by Martin Zapletal
Wed, Jul 1, 2015

The machine learning pipelining API for Apache Spark was released in December 2014 in version 1.2 [1]. The available resources [2], [3], [4] or [5] only present the same simple examples. But how does it work in practice, what are the strengths and weaknesses and is ready for production use? This blog post will try to answer these questions.

Posted by Jan Machacek
Mon, Apr 13, 2015

As you probably know by now, Muvr performs near real-time exercise classification. It does so by fusing data from multiple (wearable) sensors, then sends the raw data to the server, in a simple binary encoding. The server decodes the data, reconstructs the sensor's data and locations, and feeds column slices to to the exercise model.

Posted by Carl Pulley
Mon, Mar 23, 2015

In this post, we monitor real-time streams of event data looking for pattern matches. Monitoring is performed using a novel and expressive query language based on linear dynamic logic (a generalisation of linear time temporal logic), with modern SMT provers (e.g. CVC4 and Z3) defining the pattern matching workhorse.

Posted by Carl Pulley
Wed, Mar 18, 2015

In this post we investigate how Akka streaming can be used to define flexible and programmable workflows that successfully integrate ML classifiers (such as decision trees, Bayesian networks, SVMs, etc.) to build complex classification pipelines.

Posted by Carl Pulley
Sun, Mar 15, 2015

Lift uses Akka streaming workflows to define a flexible and generic exercise classification pipeline. The classification pipeline is able to modularly include any machine learning classifier and is able to monitor the real-time streams of classification results using a linear dynamic logic.

This post provides a summary overview of this classification pipeline with future posts introducing the implementation details.

Posted by Carl Pulley
Mon, Jan 26, 2015

In this post we demonstrate how machine learning (specifically SVMs) may be used to identify gesture events, such as taps, in data steams produced by accelerometers in devices such as Pebble watches.

We start by developing prototype classification models in R and then port those models into Scala.

Applying the trained SVM models to unseen data, we successfully demonstrate an ability to punctuate exercise sessions by identifying taps to tokenise those exercise steams into separate activity periods.

Recent Posts

Posts by Topic

see all

Subscribe to Email Updates