SAP HANA for Data Analytics

As the boom for Digital Transformation, AI & IoT is growing Multifold, so is the need of effective extraction of meaningful insights. What is clear is that data science is solving problems. Data is everywhere, and the uses we are making out of it (science) are increasing and impacting. Let’s understand what Data Science is all about and how effectively SAP services can be used for the same.

What is Data Science?

Data Science is an interdisciplinary field about processes and systems that enable the extraction of knowledge or insights from data. Data Science employs techniques and theories drawn from a wide range of disciplines such as Information Science, Statistical Learning, Machine Learning etc to build insightful result, trends and aid discussion making.

There are different Data Science solutions available from SAP, let’s take a look at SAP HANA in this blog.

SAP HANA

HANA is the most trusted Predictive Application and performs in-memory data mining and statistical calculations which generate large datasets in quick time for real-time analytics.

In-Memory Database

HANA allows data analysts to query large volumes of data in real-time. HANA’s in-memory database infrastructure frees analysts from having to load or write-back data. HANA’s columnar-based data store is ACID-compliant and supports industry standards such as structured query language (SQL) and multi-dimensional expressions (MDX).

Rage of Algorithms

With wide range of algorithms that are available in HANA to do various Analysis like Association, Classification, Regression, Cluster, Time Series, Probability Distribution, Outlier Detection, Link Prediction, Data Preparation and Statistic Functions, SAP HANA offers to identify unforeseen opportunities, better understand customers, and uncover hidden risks.

Real-time Analytics

HANA also includes a programming component that allows a company’s IT department to create and run customized application programs on top of HANA, as well as a suite of predictive, spatial and text analytics libraries across multiple data sources. Because HANA can run in parallel to a source SAP ERP application, analysts can access real-time operational and transactional data and not have to wait for a daily or weekly report to run. You can integrate R with SAP HANA and standalone.

In our further blogs, we will high-light other capabilities of SAP HANA for your businesses

Use Data Lakes to tap on the Future of Artificial Intelligence

Future will be, to put it correctly, present is of artificial intelligence, Artificial intelligence has moved far beyond the stuff of science fiction. And, for all the benefits AI provides today, we can only guess at what the future of artificial intelligence holds.

But data lakes help ensure that organizations are poised to take advantage.

Biggest trend we see today mainstream adoption of artificial intelligence. See can see it being used almost everywhere. One of the major used instances is, that is driving adoption (at least, in a generic sense) is that artificial intelligence engines can sometimes be used to spot trends and derive meaningful insight from an organization’s existing data.

But the crux is that for the artificial intelligence to this needs access to raw data. There are obviously a number of different ways of making this data available for analysis, but one of the best options may be to create a data lake.

 

What is Data Lake

Data lakes typically, is a large collections of data, structured or unstructured. In broader term it can contain data just about everything, from a filed data (unstructured) to the one created by IoT-enabled industrial sensors. Data lakes, by their very nature, are large and disorganized.

Which poses a questions, why create something as chaotic as a data lake, when it’s probably going to be easier to configure an artificial intelligence engine to analyse structured data instead.

Let’s take a look at the different reasons as to why data lakes

  • Data lakes give you the opportunity to analyse data that might have previously been ignored. Structured data sets, by their very nature, are limited.
  • Data lakes act as repositories for pretty much anything and everything. As such, there is a feasible path for analysing data that otherwise would not be usable.
  • Data lakes act as a backbone for carefully tuned AI engine to extract hidden business insight from otherwise mundane data.
  • data lake approach to storage allows an organization to be more agile and better positioned to take advantage of advancements in artificial intelligence. Data lakes can accommodate all data, independently of any schema.

Data lakes require IT pros to think of data storage in a way that is completely different from how they might have thought of storage in the past. Even so, this new approach holds great promise for making organizations more agile and better positioned to take advantage of advancements in artificial intelligence.

Data Science – how can Startups leverage?

As a startup, there are many areas that demand the focus from founders. Depending on the phase of the start-up, data science may be treated with different levels for importance. However, early investments in data science has always proven to be having high impact on profitability. This article, we will discuss we will review the possibilities of using data science technology for startups. We will evaluate how startups can use data pipelining and leverage data platform in order to harness the power of data.

Data science in start-ups, your benefits!

Business is getting data centric. But the biggest challenge the start-ups could face is to get the data. For startups, data scientists have to build the architecture from scratch. As compared to the larger industries, start-ups may not be flush with data accumulated over time. The first step is to have a dedicated person or service provider to set-up and build the data acquisition architecture for the start-up business. The first steps include

  • Sources of data extraction
  • Strategy and tools to build  Data Pipelines
  • Developing KPIs for data
  • Visualizing tools for developing insights
  • Building models
  • Testing and Validating to improve performance

Sources of data extraction

The user base and the number of event logs that access the application are the two starting points for data extraction. The user base can be further divided into active users and their sessions, inactive users and their drop-off points, and the details of the events/transactions that the active users are utilising. The data that must be collected is based on the above parameters.  Additionally, certain domain-specific attributes are required to gauge the number of users an their usage pattern. Even the simple insights on dropout rate of users are highly useful to make the solution better improve engagement.

Trackers are critical to acquiring this data in an organised manner. The best measure to carry this out is through writing tracking specifications in order to identify attributes and take appropriate steps to implement events. The tracking events are essential on the client side as they send data to the server which is for analysis and for the development of your data products. Early stage startups usually suffer from data starvation. Therefore, in order to make products better, embedding event trackers in your product is the best approach towards collecting data at a dynamic pace.

Strategy and tools to build  Data Pipelines

A data pipeline helps to process the collected data for quick and meaningful analysis. A good and healthy data pipeline has several distinct characteristics:

  • Near ‘real-time’ delivery – access and process data in minutes or seconds
  • Flexible querying – support longer batch queries or quick but interactive queries
  • Scalability – Since, start-ups are expected to add and accumulate data as they grow
  • Alerts and errors – timely alerts and errors for syndication or reception errors, no reception etc.
  • Testing for speed – the pipeline should be easy to test for performance, anonymously, including database connections

Developing KPIs for data

A strong pipeline is a result of recognising the type of data.

  • Raw Data – The raw data does not have any schema applied to them are do not have a particular format attached to them. The events are tracked as raw data is shared, and schema applied at a much later stage.
  • Processed Data – With the implementation of schemas over the raw data, it becomes processed data. It is encoded in specified formats and is stored in a different location in the data pipeline.
  • Cooked Data — A summary of the processed data  which can contain multiple attributes based on usage data.

KPIs or key performance indicators captures engagement, retention, growth in order to determine the usefulness of the changes applied to the product or business model of the start-up. This also involves data engineering and standalone analysis. However, the one should  focus on implementation of reproducible reporting events and dashboards that track product or business performance. The KPIs are then available on demand and not required to be compiled manually, every time they are required.

Visualizing tools for developing insights

Generating Reports

R is the most popular programming language for data science. While R is used widely in data science for creating plots and building web-applications, it is also used for automated report generation. Some of the useful approaches towards building reports with R is using R to directly create the base plots, generating reports with R Markdown and using Shiny to create interactive visualizations.

ETLs for Data Transformation

ETL stands for Extract, Transform and Load. The main role of ETL is to transform raw data into processed data and processed data into cooked data. This cooked data is present in the form of aggregated data. One of the key components of a pipeline is the raw events table. The ETL processors can be set up to transform raw data into processed data. We can also create cooked data from processed data using ETLs. We can schedule the collection of ETLs to run on the data pipeline. There are various tools that can assist in monitoring and managing complex data.

Exploratory Data Analysis for your Data Product

After setting up your data pipeline, the next step is to explore the data and gain insights about improving your product. With Exploratory Data Analysis or EDA, you can understand the shape of your data, find relationships between data features and gain insights about the data.

Some of the methods of analyzing the data are –

Summary Statistics – To better understand the dataset with mean, median, mode, variance, quartiles etc.

Data Plotting – method of providing a graphical overview of the data through line charts, histograms, bar-plots, pie charts. or applying log-transforms to data not present in normally distributed forms

Correlation of Labels – Find which features are correlated within the dataset by comparing each feature of the dataset with the goal of finding a correlation between a single feature.

Building Statistical Models

Machine Learning is used to make predictions by programmatic classification of the data. With predictive modeling tools user behavior is forecasted and further tailor the products or business model based on how the user behavior.

For example, if the startup has identifying recommendation system as an opportunity, then a predictive model to recommend products or content to the user based on their buying or watch history is possible. Here again, there are two prevalent methods:

  • Supervised Learning – the development of a prediction model based on labeled data mostly using regression and classification techniques. Regression is used to predict continuous values, classification categorizes the values in classes to identify the likelihood of the outcome of a variable.
  • Unsupervised Learning – applied where data is not explicitly arranged in labels using clustering and segmentation techniques.

The eager model and lazy model are used to apply machine learning on the data sets. The eager model forms rulesets dynamically at the training time itself. The lazy model generates rulesets during the training time and are therefore more preferred in building real-time application systems as the model is updated with modifications or changes in data.

Crafsol has extensive experience in running machine learning tools with prediction models are Weka, BigML, R and Scikit-learn (Python).

Testing and Validating to improve performance

The data warehouses and marts are not static entities and must be re-architectured from time to time. However, the biggest measure of the success of Data Science in an start-up is its use and benefits. While every organisation that takes up data science stands a risk of low utilisation either due to lack of alignment in the insights or their timely unavailability. This is true especially for start-ups which are in a continuous turmoil of change at multiple levels – business model, and data acqusition.

Conclusion

Data science is essential to make better products and improve customer experience. Startups should invest in ensuring the quality data acquisition, its systematic processing from the very beginning. Essentials, such as building data pipelines to assist in faster processing of the data, are equally important to ensure a strong foundation for data-driven decisions. A strong initial investment can go a long way in creating a sustainable competitive edge for the start-ups business model and solution. It also shows the scientific approach in making decisions when interacting with key stakeholders including customers or investors.

Crafsol has been advising and consulting start-ups on use of machine learning and business intelligence to improve customer experience. We work as a partner with fast growing start-ups in India, USA and Australia to help them establish a strong data science practice early on in their business phase.

Why data driven culture is the future of your company – and how business intelligence can help!

The BI-power decisions road map starts with quality data capture, and graduates to automating decision making. Further graduations include reduction in the burden of decision making and even self-service decisions for stakeholders.

Business Intelligence has several definitions.

The technical definition will include all the right keywords such as processes, architectures, information, transformation and insights to profits. However, a more practical definition based on the utility of the technology, that we often propose to our clients is – if you are able to take better decisions and still manage to go an hour or two early to your homes, we will call it a success.

But what appears to be a simple and practical definition takes a good lot of effort. Even when there is a whole range of technologies and methods and practices already available in the market. The reason is culture. For organisations are traditionally used to take decisions based on intuitions. In fact, one’s growth in the traditional organisation is actually a result of his ability to take decisions, even when the information that he/she has is incomplete. For effective use and appreciation of Business Intelligence, a new culture is required.

 

Data Science Road map includes 5 key steps.

Automating the data capture and storage

The raw material for data sciences is, well, data. And to drive it, the process of collecting and organising data, automatically, is like half the battle won. This requires tools to collect the data from disparate sources, use proper transformation and standardisation techniques and store it in a database, data mart, data warehouse or a more modern way would be data store. The data store will serve as the central repository of data made available for insights into a variety of decisions. The aim of creating data mart is to standardise and optimise the data to keep it ‘analytics’ ready. The quality of data that is being collected is critical to churn ‘quality’ insights without error or doubts.

Typically, the type of transactions being captured and stored in the data mart define the need for refreshing the data, its perish-ability for insights. The more recent and frequent the updates to the data warehouse or mart, the more the insights from the BI tool are closer to the ‘real time’

Building the business intelligence layer

Typically, as organisations get used to business intelligence they need advanced tools and platforms to set users questions and find answers, quickly. The reporting environment which includes visualisation and analysis tools is the first layer of business intelligence. This flexible and interactive tools are accessible to the end users, decision makers and of course the IT team. The tools available in the market come with a variety of flexibility in terms of assumptions, the questions, and the ability to generate answers. The key here is the users start understanding the value of accumulated data and the need for further analytical methodologies to identify hidden patterns and trends. That brings us to the next step which is statistical modelling and data mining.

Data mining

As users realise the importance of having rich, clean and automated data capture  coupled with hidden patterns in accumulating data the environment is ripe to introduce data mining. The step involves use statistical and mathematical techniques to create predictions, assess risks, and identify opportunities that were previously unavailable.

Enabling data mining in the organisation typically requires expert group of experienced data scientist, subject matter experts, and the internal IT team to work together. The group should be able to build and model the organisations data in unique way to present meaningful insights. While the users continue to use the first layer of business intelligence, the data scientists work to mine deeper insights from  disparate sets of accumulated data.  Over a period of time this creates the possibility of a centre of excellence in analytics.

The centre of excellence

With a more mature infrastructure and methodology the data can be  leverage further to reflect greater set of meta data, business challenges, and innovative analysis ideas to create wider analytical cycle. This expanding cycle becomes a way of looking and conducting business and eventually helps to drive further investments in data  acquisition and reporting strategies.

Several global business organisations today have Centre of excellence in analytics in India. Crafsol has been supporting small and medium enterprises to build and run their centres of excellence by providing an outsourced data science team based out of India. The dedicated team offers end-to-end support in analytics activities from processing of data, transforming initial data into a data mart and developing insights on ongoing basis through analytics.

Real-time decision making

The availability of business intelligence insights in this phase shift from the desktop to cell phones. From alerts messages to highly interactive reports, dashboards and even micro applications, there are multiple ways to support faster and real-time decision making. This also requires optimisation of internal decision workflows, exact data capture from customer touch points, and leverage the same for real time and dynamic decision making.

Conclusion

When the BI is powered by a centralised, easily available, and high-quality data mart the impact of decisions is seen immediately. The analytics service provider has a key role in enabling the power of business intelligence for your decision-making.  The ultimate goal of course is to use the intelligence and drive the customer experience to a whole new level. Crafsol has been helping organisations for over 10 years now to choose the right tools for their business intelligence implementations.  We have also been engaged to analyse model and predict various insights for the business using Analytics on an on going basis. Whether it’s creating a data driven culture for your organisation or building a state of the art business intelligence and were meant for your business, Crafsol can work as your partner throughout the process.