There is a clear need for measuring the quality of service by defining relevant KPI criteria.
We put all WebLogic server logs inside a partitioned Apache Hive table which is stored inside HDFS as the Optimized Row Columnar (ORC) file format.
KPI – Total Errors and Average MTBE (Mean Time Between Errors)
This KPI shows the total error count and the mean time between errors (in hours) for all servers. The query runs at the beginning of every month and collects data for the previous month. Results are then appended in corresponding server graphs. In this way it is very easy to see trends, improvement results in development and testing processes and maturity level of products.
lead(tstamp) over (partition by servername order by tstamp) next
where severity = ‘Error’ and tstamp > unix_timestamp() – 3600*24*30
order by servername,tstamp
group by t.servername;
Example results for one month:
|Server Name||Errors||Average MTBE (hours)|
If we add condition and logmessage like ‘Exception%’ in the where clause, we can track development errors. The rest belongs to support organization (e.g. logmessage Tunneling result not OK, result: ‘DEAD’, and alike).