Posted by Martin Zapletal
Mon, Nov 14, 2016

Welcome to a new edition of #ThisWeekInScala!

This blog aims to keep you up to date with the latest news from the world of Scala and Reactive programming.

Posted by Jan Machacek
Tue, Apr 19, 2016

Thank you to the attendees at the DataStax Summit Europe. We are pleased to give you downloadable slides from our talk as PDF Personal trainer on the SMACK stack, or head over to http://www.slideshare.net/AnirvanChakraborty1/realtime-personal-trainer-on-the-smack-stack for a SlideShare version.

Posted by Petr Zapletal
Thu, Oct 1, 2015

In the previous post we were discussing reasons behind rising demands of stream processing and the theoretical introduction into the area. Today we are going to talk about Apache Spark Streaming. I want to focus on implementation trade-offs, their consequences and interesting issues we may face. Apart of that we are going to cover its intended use cases, available support and known production deployments. The post is all about Spark Streaming’s traits and not very obvious properties. No "hello worlds" today. 

Posted by Martin Zapletal
Sun, Sep 27, 2015

Posted by Andrew SIm
Wed, Sep 9, 2015

GraphX Pregel API

Graph data and graph processing is getting more and more attention lately in various fields. It has become apparent that a large number of real world problems can be described in terms of graphs, for instance, the Web graph, the social network graph, the train network graph and the language graph. Often these graphs are exceptionally huge, take the Web graph for example, it is estimated that the number of web pages may have exceeded 30 billion. We are in need of a system that is able to process these graphs created by modern applications.

Posted by Martin Zapletal
Sun, Jul 19, 2015

Many algorithms, especially those with high computational complexity or those working with large amounts of data may take a long time to complete. Many different ways to express algorithms exist in different environments - single threaded, parallel and concurrent and distributed. In this blog post I will focus on the relationship between them and the advantages and disadvantages that the distributed environment provides. The main focus will be on Apache Spark and the optimisation techniques it applies to computations defined by its users in distributed environment.

Posted by Petr Zapletal
Mon, Jul 13, 2015

The demand for stream processing is increasing. Immense amounts of data have to be processed fast from a rapidly growing set of disparate data sources. This pushes the limits of traditional data processing infrastructures. These stream-based applications include trading, social networks, Internet of things, system monitoring, and many other examples. 

Posted by Martin Zapletal
Wed, Jul 1, 2015

The machine learning pipelining API for Apache Spark was released in December 2014 in version 1.2 [1]. The available resources [2], [3], [4] or [5] only present the same simple examples. But how does it work in practice, what are the strengths and weaknesses and is ready for production use? This blog post will try to answer these questions.

Posts by Topic

see all

Subscribe to Email Updates