In a presentation at Tableau Conference 2017 in Las Vegas, Tatiana Gabor, an analytics manager for the revenue team at music streaming company Spotify, said her team of analysts starts every project by visually exploring the available activity data collected on Spotify users. The team analyzes patterns in user behavior to understand how people respond to changes in the Spotify platform and to develop new ways to keep users engaged.
The most important benefit of visual data exploration is it enables you to assess the quality of your data, said Gabor, who works at Spotify’s U.S. headquarters in New York. You can immediately see outliers or clusters of data points that may not be realistic based on an analyst’s domain knowledge, she noted. Analysts can follow up on either of those issues and, if necessary, correct for them before beginning formal analysis.
The visual approach also highlights important aspects of data sets. For example, it shows the “shape” of data, such as whether it has a normal distribution or a long tail in either direction. It can also illuminate correlations between two variables. Of course, correlation doesn’t equate to causation, but identifying potential trends by visually exploring data can lead analysts to examine relationships between variables that they might not have thought to look at otherwise, according to Gabor and other conference speakers.
Peter Gilks, director of product insights for the Spotify revenue team, said during the presentation that any data analysis must stem from a hypothesis or a set of questions a company wants to answer. An analyst could start by just punching in queries written in R or Python — but that approach may lead to missed insights, Gilks cautioned. He said visual data exploration allows analysts to better shape their hypotheses from the beginning by highlighting patterns or trends in the data.
SAP is making it simple for developers to expand their skills using SAP Leonardo Machine Learning.
Discover the SAP Leonardo Machine Learning services on the SAP API Business Hub tutorial group will give a walkthrough the different types of services available as part of the SAP Leonardo Machine Learning Foundation services which includes image, text and series data processing.
As an example I did Time Series Forecast Arima model testing and found that the prediction interval has negative lower bound value, which is invalid for my type of data (net value amounts).
To be honest, Time Series Forecast API is currently in “alpha” status, which means that it isn’t yet available for productive use. You can, however, test the service from the SAP Leonardo Machine Learning – Functional Services
In this blog I continue further on the material presented in previous blogs that can be read on the links SAP HANA Sales Continuity Operational Report 2 and Time Series Forecasting Models
The HANA stored procedure is modified in a way that it takes care of filling possible gaps in the time series by value 2 to enable smooth Box-Cox log transformation without zero values:
The forecast package for R, created and maintained by Professor Rob Hyndman of Monash University, is one of the more useful R packages available with methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.
Exponential smoothing and ARIMA models are the two most widely-used approaches to time series forecasting, and provide complementary approaches to the problem. While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.
By using r forecast package we have the following advantages:
- The auto.arima() function is used for automatically selecting ARIMA models. When the lambda argument is specified, a Box-Cox transformation is used. The value 0 specifies a log transformation which constrain the forecasts to stay positive on the original scale. When forecasts are produced, they are back-transformed to the original space.
- A rich support for other models
Finally, the forecast results should be exposed to the outer world without any dependencies of installed software and operating systems. Rook is a web server interface and software package for R. By using it I am finalizing this story 😊.
Eclipse see IoT as consisting of three connected software stacks:
- stack of software for constrained devices (e.g., the device, endpoint, microcontroller unit (MCU), sensor hardware).
- Some type of gateway that aggregates information and data from the different sensors and sends it to the network. This layer also may take real-time actions based on what the sensors are observing.
- A software stack for the IoT platform on the backend. This backend cloud stores the data and can provide services based on collected data, such as analysis of historical trends and predictive analytics.
SiFive announced “early access” availability of the 64-bit, quad-core U54-MC Coreplex – the first Linux-ready application processor built around the open source RISC-V architecture. Aside from being open source and customizable, one of the main benefits of RISC-V is that it is fully modern, purpose built, and unburdened with legacy code.
The processor is intended for AI, machine learning, networking, gateways and smart IoT devices. A development board is set to ship in Q1 2018.
This is an unofficial PDF version of “Category Theory for Programmers” by Bartosz Milewski, converted from his blogpost series.
Direct link: category-theory-for-programmers.pdf (v0.1, September 2017)