<img height="1" width="1" src="https://www.facebook.com/tr?id=1076094119157733&amp;ev=PageView &amp;noscript=1">

Posted by Martin Zapletal
Sun, Jul 19, 2015

Many algorithms, especially those with high computational complexity or those working with large amounts of data may take a long time to complete. Many different ways to express algorithms exist in different environments - single threaded, parallel and concurrent and distributed. In this blog post I will focus on the relationship between them and the advantages and disadvantages that the distributed environment provides. The main focus will be on Apache Spark and the optimisation techniques it applies to computations defined by its users in distributed environment.

Posted by Martin Zapletal
Sun, Mar 8, 2015

Concepts such as event sourcing and CQRS allow an application to store all events that happen in the system using a persistence mechanism. The events can not be mutated and current state of the system in any point in history can be reconstructed by replaying all the events until that point. For performance reasons obviously the state can be cached using a snapshot. But the undisputable advantage of this approach is that the whole history of events (including user actions, behaviour or system messages - anything we decide to store) is available to us rather than just the current state. Event sourcing was thoroughly discussed before for instance in [1] or [2] and CQRS in [3], [4] or [5]

In this post we will discuss how we can store and further use these data by connecting Akka, Cassandra and Spark, focusing mostly on the configuration, Akka serialization and Akka-analytics project. Later I will follow up with another blog post building on top of this with an example of using machine learning techniques to obtain some insights to help optimize future decisions and application workflow.

Posted by Martin Zapletal
Sun, Feb 8, 2015

In one of my previous blog posts I introduced MLlib, Apache Spark's machine learning library. It discussed the basics of MLlib's api, machine learning vocabulary and linear regression http://www.cakesolutions.net/teamblogs/spark-mllib-linear-regression-example-and-vocabulary. Today I will have a bit deeper look at Spark's internals and the programming model - the options it provides to a programmer to implement and parallelise algorithms. I will demonstrate it on implementation of parallel pool adjacent violators solution to isotonic regression. The code was sent as a pull request to Spark and should be included in Spark 1.3 when it is released.

Recent Posts

Posts by Topic

see all

Subscribe to Email Updates