Data analytics (DA) is the process of examining data sets to draw conclusions about the information it contains, increasingly with the aid of specialized systems and software. Over the past few years, we’ve seen a remarkable growth of data analytics, in organizations as varied as consumer goods companies, professional sports franchises, political consultancies, medical research institutions, financial firms and much more, not even a single industry is untouched the use of data analytics. If you analyse the competitive market, you’ll come to know, how deeply the analytics is responsible for changing the face of the market, business decisions and the standard of goals.
Every organization is acknowledging the power of data analytics and seeing it as a new hope and with this hope, they are driving their business to the new heights by taking more complex and risky decision. Preciously, I would like to tell you that, the big IT giants are not only using the analytics, they are also doing research and development to get the better prediction accuracy and confidence level to serve the world in a new way.
With the use of basic analytics and other data mining techniques, we can get the best insights of the data. So, let’s discuss all the branches (i.e. Descriptive, Predictive & Prescriptive) of data analytics by using different use cases to get the better understanding of how data analytics automatize the information or insights?
Descriptive Analytics are used to describe the basic features of the data with the help of analytical tools (i.e. R, SAS, Tableau) and other statistical methods. It provides simple summaries about the sample and the measures.
Use case – I
Let’s start working with “Online Retail” dataset which is well known and very popular among us. I am taking this dataset for the sake of understanding.
So, below is the head of the retail transactional data which consist of 5 lakhs records, having the daily transaction with eight different variables which can be seen below.So, by using some basic visualization tools and techniques, we have the basic information about the data, what data is about?
In the top left plot (i.e. time series – monthly sale), we can see that in the month of November-2011, we have the highest sale compare to other months and in respect of that the very next month Dec-2011, has the marginal drop in sale, by analysing this plot we can at least have the basic idea that there was something wrong in Dec-2011, we can also figure out top highest selling months in the year of 2011. If we look at trend line which shows an upper trend, that means over the past months we have significant growth rate.
In the top right plot (i.e. customer wise sale report), we can observe that who are the top customers in the list with specific transaction details and we can focus to serve them as good as possible.
In the lower plot (i.e. country wise sale report), it’s a geographical plot for different countries based on sale generated for the region, if we focus on highlighted region we’ll come to know that the “United Kingdom” is the country who is having the highest sale in past year compare to others.
The table below, tells the basic details about dataset with the five-point summary, we will also have the idea about the range of the variables (i.e. amount & Quantity) including skewness and kurtosis as well.
Summary table (transaction dataset):
Below plot shows, the highest selling product in decreasing order.
Predictive Analytics is the branch of advanced analytics which is used to make the prediction of unknown future events. It uses many techniques from statistics, machine learning, artificial intelligence, data mining and modelling to make the prediction for unknown future events.
To understand the techniques of predictive analytics, let’s work with another use case.
Use Case – II
Here, I am using the “German Credit” data, having 1000 number of records and 21 columns, let’s see what information we can get from this data using predictive analytic techniques?
data reference link: https://onlinecourses.science.psu.edu/stat857/sites/onlinecourses.science.psu.edu.stat857/files/german_credit.csv So, we have the structure of the data below and we can see that all are “integers”. After checking the structure of the data, now we have the basic idea about the data.
After doing descriptive analysis about the credit data, the next thing is to do some predictive analysis on the data, So, here we are going to apply some popular predictive analytic techniques like regression model, decision tree etc.
Model – I
These are classification models that partition data into subsets based on categories of input variables. So, let’s apply DT on credit data.
library(rpart) set.seed(1) TreeModel <- rpart(Creditability ~ ., data = credit[i_calibration1, ]) library(rpart.plot) prp(TreeModel, type = 2, extra = 1)
we are looking at all the variables in our model to find their impact on our variable of interest, Creditability. And with the help of DT, we can easily find the impact of other variables over a variable of interest. For the sake of deep knowledge, we also have the DT plot below, from which we can identify the different level and nodes of DT and the classification criteria for the different levels are calculated with the help of entropy.
Model – II
Regression (linear and logistic):
It is one of the most popular methods in statistics. Regression analysis estimates relationships among variables. So, let’s apply “Regression” on credit data.
Regression modelling in R, codes are below:
Model <- glm(Creditability ~ Account.Balance + Payment.Status.of.Previous.Credit + Purpose + Length.of.current.employment + Sex...Marital.Status, family = binomial, data = credit[i_calibration1, ])
The regression output is shared below with highlighted variables, which means those variables have the significant impact on credibility (i.e. variable of interest).
If we see the highlighted variables above, we can conclude that the p-value of those variables is lesser than 0.05 (significant level) and the impact is not just by chance.
Prescriptive Analytics is basically the area of business analytics which is dedicated to getting the best suggestions for a current situation. It is related to both descriptive and predictive analytics. It optimizes decision and helps us to maximize profitable growth and mitigate the risk as well.
Application – Prescriptive Analytics in Healthcare and Clinical Action:
As we all know the importance of healthcare industry in our life. So, the very first question comes into our mind, how can we integrate predictive analytics into a healthcare delivery system? In simple terms, the prediction is most useful when that knowledge is conveyed into clinical action. This is what is meant by “integrated prediction” or prescriptive analytics.
What more we can do? – Healthcare Industries:
We can more effectively improve the services of healthcare industries by using prescriptive analytics we can simulate most of the problems which we are facing on daily basis, below are the things on which we can work on:
- we can reduce the readmissions of patients and avail the better medical treatments and services.
- we can simulate the cost and use it to extend the services or any other research and development.
- we can set the priority list of patients.
- we can act to mitigate this risk, such as emphasizing patient education at discharge or ensuring timely communication with primary care physicians and acute care facilities.
At the end, it’s important to note that the use of prescriptive analytics in healthcare shouldn’t replace human intervention and decision-making in patient care and that is there. Rather, prescriptive analytics can provide a better and effective guidance for doctors and administrators to use the critical data and information to support clinical, financial and operational decisions and put them on the path to successful outcomes.
The major goal of analytics is helping organizations make more informed business decisions in the form automated insights, captured from by enabling predictive modelers, data scientists and other analytics professionals for analysing large volume of different forms of data which may be untouched due to their complexity, this can include Internet clickstream data and weblogs, internet contents and social activity, survey responses and emails, sensor data connected with IoT.
For more insights on data insights and R programming (Data Science) feel free to get in touch with us through email@example.com, you can also share this content with your network.