Visual data exploration

In a presentation at Tableau Conference 2017 in Las Vegas, Tatiana Gabor, an analytics manager for the revenue team at music streaming company Spotify, said her team of analysts starts every project by visually exploring the available activity data collected on Spotify users. The team analyzes patterns in user behavior to understand how people respond to changes in the Spotify platform and to develop new ways to keep users engaged.

The most important benefit of visual data exploration is it enables you to assess the quality of your data, said Gabor, who works at Spotify’s U.S. headquarters in New York. You can immediately see outliers or clusters of data points that may not be realistic based on an analyst’s domain knowledge, she noted. Analysts can follow up on either of those issues and, if necessary, correct for them before beginning formal analysis.

The visual  approach also highlights important aspects of data sets. For example, it shows the “shape” of data, such as whether it has a normal distribution or a long tail in either direction. It can also illuminate correlations between two variables. Of course, correlation doesn’t equate to causation, but identifying potential trends by visually exploring data can lead analysts to examine relationships between variables that they might not have thought to look at otherwise, according to Gabor and other conference speakers.

Peter Gilks, director of product insights for the Spotify revenue team, said during the presentation that any data analysis must stem from a hypothesis or a set of questions a company wants to answer. An analyst could start by just punching in queries written in R or Python — but that approach may lead to missed insights, Gilks cautioned. He said visual data exploration allows analysts to better shape their hypotheses from the beginning by highlighting patterns or trends in the data.

Objavljeno u Nekategorizirano | Ostavi komentar

SAP Leonardo Machine Learning

SAP is making it simple for developers to expand their skills using SAP Leonardo Machine Learning.

Discover the SAP Leonardo Machine Learning services on the SAP API Business Hub tutorial group will give a walkthrough the different types of services available as part of the SAP Leonardo Machine Learning Foundation services which includes image, text and series data processing.

As an example I did Time Series Forecast Arima model testing and found that the prediction interval has negative lower bound value, which is invalid for my type of data (net value amounts).

To be honest, Time Series Forecast API is currently in “alpha” status, which means that it isn’t yet available for productive use. You can, however, test the service from the SAP Leonardo Machine Learning – Functional Services

Objavljeno u Nekategorizirano | Ostavi komentar

Time Series Forecasting in R & SAP

In this blog I continue further on the material presented in previous blogs that can be read on the links SAP HANA Sales Continuity Operational Report 2 and Time Series Forecasting Models

The HANA stored procedure is modified in a way that it takes care of filling possible gaps in the time series by value 2 to enable smooth Box-Cox log transformation without zero values:

The forecast package for R, created and maintained by Professor Rob Hyndman of Monash University, is one of the more useful R packages available with methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.

Exponential smoothing and ARIMA models are the two most widely-used approaches to time series forecasting, and provide complementary approaches to the problem. While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.

By using r forecast package we have the following advantages:

  1. The auto.arima() function is used for automatically selecting ARIMA models. When the lambda argument is specified, a Box-Cox transformation is used. The value 0 specifies a log transformation which constrain the forecasts to stay positive on the original scale. When forecasts are produced, they are back-transformed to the original space.
  2. A rich support for other models

Finally, the forecast results should be exposed to the outer world without any dependencies of installed software and operating systems. Rook is a web server interface and software package for R. By using it I am finalizing this story 😊.

Objavljeno u Nekategorizirano | Ostavi komentar

The Three Software Stacks Required for IoT Architectures

Eclipse see IoT as consisting of three connected software stacks:

  • stack of software for constrained devices (e.g., the device, endpoint, microcontroller unit (MCU), sensor hardware).
  • Some type of gateway that aggregates information and data from the different sensors and sends it to the network. This layer also may take real-time actions based on what the sensors are observing.
  • A software stack for the IoT platform on the backend. This backend cloud stores the data and can provide services based on collected data, such as analysis of historical trends and predictive analytics.
Objavljeno u Nekategorizirano | Ostavi komentar

First RISC-V Based CPU Core with Linux Support

SiFive announced “early access” availability of the 64-bit, quad-core U54-MC Coreplex – the first Linux-ready application processor built around the open source RISC-V architecture. Aside from being open source and customizable, one of the main benefits of RISC-V is that it is fully modern, purpose built, and unburdened with legacy code.

The processor is intended for AI, machine learning, networking, gateways and smart IoT devices. A development board is set to ship in Q1 2018.

Objavljeno u Nekategorizirano | Ostavi komentar

SAP S/4HANA 1709

SAP has released SAP S/4HANA 1709, a new release of SAP’s next-generation ERP suite as of September 15th, 2017.

This release runs on SAP HANA 2, the second generation of SAP’s in-memory database platform, which supports scale-out and usage of active-active for added performance. Active/Active clustering means that you actually can do something productive with your secondary instance of HANA instead of sitting there waiting for a disaster to happen. So some of the analytical workloads can be redirected to the secondary instance.

SAP S/4HANA 1709 incorporates SAP Leonardo Machine Learning capabilities and predictive analytics into core business processes to help organizations stay competitive in a rapidly changing business environment.

There are several apps available in 1709 which uses machine learning. The example given in the picture below is the Cash Application. The system uses all the historical data to learn matching criteria in order to clear payments better when importing bank statements.

Machine Learning in Finance in S/4HANA 1709

Companies that are already running SAP ERP 6.0 (enhancement package 0 and higher) can migrate directly to SAP S/4HANA with a system conversion.


To improve the experience when adopting SAP S/4HANA, SAP recently launched a new set of tools. The SAP Transformation Navigator helps customers determine their go-to solutions. See

Customers can also take advantage of the SAP Readiness Check for SAP S/4HANA before making an S/4HANA decision. This tool is available for all customers and included in maintenance. You can initiate the SAP ERP system analysis, getting an overview of the most important aspects and potential requirements of an SAP S/4HANA system conversion (e.g. simplifications, custom code, business functions, add-ons, and transactions). See

Objavljeno u Nekategorizirano | Ostavi komentar

Category Theory for Programmers

This is an unofficial PDF version of “Category Theory for Programmers” by Bartosz Milewski, converted from his blogpost series.

Direct link: category-theory-for-programmers.pdf (v0.1, September 2017)

Objavljeno u Nekategorizirano | Ostavi komentar