Posted by Cornel Foltea
Wed, Nov 11, 2015

Posted by Martin Zapletal
Sun, Sep 27, 2015

Posted by Peter Evison
Fri, Sep 25, 2015

Posted by Martin Zapletal
Sun, Jul 19, 2015

Many algorithms, especially those with high computational complexity or those working with large amounts of data may take a long time to complete. Many different ways to express algorithms exist in different environments - single threaded, parallel and concurrent and distributed. In this blog post I will focus on the relationship between them and the advantages and disadvantages that the distributed environment provides. The main focus will be on Apache Spark and the optimisation techniques it applies to computations defined by its users in distributed environment.

Posted by Petr Zapletal
Mon, Jul 13, 2015

The demand for stream processing is increasing. Immense amounts of data have to be processed fast from a rapidly growing set of disparate data sources. This pushes the limits of traditional data processing infrastructures. These stream-based applications include trading, social networks, Internet of things, system monitoring, and many other examples. 

Posted by Martin Zapletal
Wed, Jul 1, 2015

The machine learning pipelining API for Apache Spark was released in December 2014 in version 1.2 [1]. The available resources [2], [3], [4] or [5] only present the same simple examples. But how does it work in practice, what are the strengths and weaknesses and is ready for production use? This blog post will try to answer these questions.

Posts by Topic

see all

Subscribe to Email Updates