SAP Analytics and Predictive Analytics by Using R Statistical Software

Why
Analytics is the understanding of existing (retrospective) data with the goal of understanding trends via comparison.

For example, analyzing sales data with clustering algorithms like EM and K-Means reveals many interesting patterns useful for improving sales revenue and achieving higher sales volume.

Predictive Analytics is “forward thinking” by using mathematical modeling for the purpose of good decision making. It brings together management, information technology, and modeling.

By learning from your abundant historical data, predictive analytics delivers something beyond standard business reports and sales forecasts: actionable predictions for each customer. These predictions foreseeing which customers will buy, click, respond, convert or cancel.

Whether forecasting sales or market share, finding a good retail site or investment opportunity, identifying consumer segments and target markets, or assessing the potential of new products or risks associated with existing products, modeling methods in predictive analytics provide the key.

Predictive analytics, like much of statistics, involves searching for meaningful relationships among variables and representing those relationships in models. We can fit many models to the available data, then evaluate those models by their simplicity and by how well they fit the data.
But be warned: “All models are wrong but some are useful.” (Box & Jenkins, 1976).

How
R is free and open source highly extensible cross-platform software with comprehensive language for managing and manipulating data and wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques.

R has over 6900  packages available from multiple repositories specializing in topics like econometrics, finance, data mining, machine learning, spatial analysis, and bio-informatics.

The R language has become very popular among statisticians and data miners for developing statistical software and is widely used for advanced data analysis.

R and HANA Integration
SAP has joined the list of big companies embracing the R language. SAP has committed it’s latest products including the in-memory device HANA and the newly launched Business Objects Predictive Analytics to be tightly integrated with the algorithms and statistical libraries available in R.

The goal of the integration of the SAP HANA database with R is to enable the embedding of R code in the SAP HANA database context. This scenario is suitable when an SAP HANA-based modeling and consumption application wants to use the R environment for specific statistical functions.

image

RSAP R Package
You can do R analytics even if you don’t use HANA.
R package RSAP implements SAP RFC connectivity for R using the SAP NetWeaver RFC SDK (NW RFC SDK).

In the following example I will show R and SAP integration with hints for its usage.

I use:

  • Clustering based methods for finding outliers consider a cluster of small sizes (including size one) as clustered outliers.
  • The identification of outliers in multivariate data based on Mahalanobis distance.
  • Partitioning Around Medoids clustering algorithm that uses a Mahalanobis distanice as a distance metric to perform partitional clustering.
  • SPRINT (Simple Parallel R INTerface) is a parallel framework for R. It is intended to make High Performance Computing (HPC) accessible to R.
  • ppam() is a clustering function that performs a Parallel Partitioning Around Medoids and is based on # the pam() function from the cluster R” package.

library(RSAP)
con <- RSAPConnect(ashost=”***”, sysnr=”***”,client=”***”, user=”***”, passwd=”***”, lang=”EN”,trace=”1″, lcheck=”1″)

wb2_v_vbak_vbap2=RSAPReadTable(con, “WB2_V_VBAK_VBAP2”, options=list(“ERDAT >= ‘20120101’ “, ” AND AUART = ‘TA’ “, ” AND PSTYV_I NOT IN (‘ZPRI’,’ZTE1′,’ZTE2′)”), fields=list(‘VBELN’, ‘ERDAT’,’ERNAM’,’VKORG’, ‘VKGRP’, ‘KUNNR’, ‘NETWR’, ‘WAERK’, ‘POSNR_I’,’PSTYV_I’, ‘MATNR_I’, ‘KWMENG_I’, ‘NETWR_I’,’WAERK_I’))
names(wb2_v_vbak_vbap2)=c(“SalesDocNum”,”Created”,”CreatedBy”,”SalesOrg”,”SalesGroup”,”SoldToParty”,
“NetValue”,”DocCurrency”, “Item”,”ItCa”,”MaterialNum”,”Quantity”,”NetValue_I”,”DocCurrency_I”)

RSAPClose(con)

cholMaha <- function(X) {
dec <- chol( cov(X) )
tmp <- forwardsolve(t(dec), t(X) )
dist(t(tmp))
}

mahalanobisDistances <- cholMaha(datamat)
library(“sprint”)
pamclust <- ppam(mahalanobisDistances,k=4)

Oglasi
Ovaj unos je objavljen u Nekategorizirano. Bookmarkirajte stalnu vezu.

Komentiraj

Popunite niže tražene podatke ili kliknite na neku od ikona za prijavu:

WordPress.com Logo

Ovaj komentar pišete koristeći vaš WordPress.com račun. Odjava / Izmijeni )

Twitter picture

Ovaj komentar pišete koristeći vaš Twitter račun. Odjava / Izmijeni )

Facebook slika

Ovaj komentar pišete koristeći vaš Facebook račun. Odjava / Izmijeni )

Google+ photo

Ovaj komentar pišete koristeći vaš Google+ račun. Odjava / Izmijeni )

Spajanje na %s