Machine Learning: The Future Of Data Insights. The Time To Start Is Now (Part 2 of 2)

· 12 min read
Amazon Web Services
Amazon Web Services

In the first part of this article, we discussed four new data analytics capabilities powered by ML and NLP technologies that large and small companies can use to enhance their ability to generate data insights. In this second part, we'll introduce a couple of powerful yet affordable new tools offering these capabilities and an agile approach that any company can adopt to quickly start leveraging modern BI and ML technologies to enhance decision-making and business performance.

Amazon QuickSight: best ML-powered BI tool

Amazon QuickSight is a scalable, serverless, embeddable, machine learning-powered business intelligence (BI) service built for the cloud. QuickSight lets large and small companies easily create and publish interactive BI dashboards that include ML-powered insights. These dashboards can be accessed from any device, and seamlessly embedded into applications, portals, and websites.

QuickSight is serverless (i.e., does not require its users to provision their own servers) and can automatically scale to tens of thousands of users, without any infrastructure to manage or capacity to plan for. It is also the first BI service to offer pay-per-session pricing, where customers only pay when their users access their dashboards or reports, making it very cost-effective for both large and small scale deployments.

Revenue dashboard

With QuickSight, business users can also ask questions regarding their data in plain language and receive answers in seconds.

Another important feature of this tool is SPICE (super-fast, parallel, in-memory, calculation engine). With SPICE, users can experience blazing fast performance at scale. SPICE automatically replicates data for high availability allowing thousands of users to simultaneously perform fast, interactive analysis, saving time and resources.

QuickSight also provide the maximum level of security with state-of-the-art encryption for both in transit and at rest data.

There are several BI tools available in the market today. In my opinion, there are two key benefits that differentiate Amazon QuickSight from all other offers:

  • Lowest total cost of ownership (TCO):
    • With Amazon QuickSights there are no upfront investments for infrastructure or software licencing.
    • Authors (i.e., users that can create and publish interactive dashboards, setup email reports and more) are charged $18/user/month with an annual subscription.
    • Readers (i.e., users with a secure access to dashboards anytime and anywhere) are charged $0.30/session (1 session = 30 mins from login) up to a maximum of $5/user/month.
    • As an example, a company with 10 authors working full-time or nearly full-time creating reports and dashboards and 500 readers using the service on average 8 hours each month would incurr no upfront investment and have a monthly cost of only $1,380.
  • Most extensive and advanced set of ML and NLP powered data analytics capabilities:
    • Auto-narratives.
    • Natural language queries.
    • ML Anomaly detection (this service requires an additional fee. For example, a company monitoring 3 million metrics each month would incurr an additional monthly cost of $1,000).
    • ML Forecasting (basic capability using only one ML algorithm and without the ability to incorporate related data).

Companies that need a more advanced time series ML-powered forecasting service can easily integrate this BI tool with Amazon Forecast (a tool that can produce up to 50% more accurate forecasts thanks to its ability to use multiple ML algorithms and to incorporate additional related and meta data). Companies that need to develop custom machine learning models can integrate this BI tool with Amazon SageMaker.

A modern cloud-based ML-powered BI service is a very affordable and easy to use tool, and large and small companies should adopt one now to leverage the power of these technologies to enhance decision-making and business performance.

Amazon Forecast: best ML-powered forecasting

While the ML-powered forecast capability embedded in Amazon QuickSight is a good option for a company that wants to start testing the benefits of ML-powered time series forecasting, companies that want to achieve the best possible results should adopt a more advanced solution like Amazon Forecast. This tool currently offers the most advanced set of capabilities and the lowest total cost of ownership.

Based on the same technology used at Amazon.com, Amazon Forecast uses machine learning (ML) to combine time series data with additional variables to build highly accurate forecasts. Amazon Forecast requires no ML experience to get started.

Users only need to provide historical data, plus any additional data that they believe may impact their forecasts.

For example, the demand for a particular color of a shirt may change with the seasons and store location. This complex relationship is hard to determine on its own, but machine learning is ideally suited to recognize it. Once a user provides their data, Amazon Forecast will automatically examine it, identify what is meaningful, and produce a forecasting model capable of making predictions that are up to 50% more accurate than looking at time series data alone.

Amazon Forecast is a fully managed service, so there are no servers to provision, and no machine learning models to build, train, or deploy. Users pay only for what they use, and there are no minimum fees and no upfront commitments.

Amazon Forecast

When using an advance tool like Amazon Forecast, companies only need to focus on identifying significant business problems that can be adequately solved with forecasting and defining effective strategies to collect, store, and use the most appropriate data. Specialized professional service firms, like us and others, can easily help with that.

Amazon Forecast has been designed to deliver four key benefits:

  • 50% more accurate forecasts with machine learning: Amazon Forecast provides forecasts that are up to 50% more accurate by using machine learning to automatically discover how time series data and other variables like product features and store locations affect each other. The models that Amazon Forecast builds are unique to each company data, which means the predictions are custom fit to each business.
  • Reduce forecasting time from months to hours: With Amazon Forecast, companies can achieve forecasting accuracy levels that used to take months of engineering in as little as a few hours. By automatically handling the complex machine learning required to build, train, tune, and deploy a forecasting model, Amazon Forecast enables any company to create accurate forecasts quickly without requiring prior ML expertise.
  • Create virtually any time series forecast: Businesses need multiple types of forecasts, from cash flow to product demand to resource planning. Amazon Forecast allows businnesses to build forecasts for virtually every industry and use case, including retail, logistics, finance, advertising performance, and many more. Using machine learning, Amazon Forecast can work with any historical time series data and use a large library of built-in algorithms to determine the best fit for a particular forecast type automatically.
  • Secure business data: Every interaction with Amazon Forecast is protected by encryption. Any content processed by Amazon Forecast is encrypted with customer keys through Amazon Key Management Service, and all data is also encrypted at rest ensuring that sensitive information is kept secure and confidential.

Using Amazon Forecast, we've been able to increase our forecasting accuracy from 27% to 76% reducing wastage by 20% for the fresh produce category. The tool helped us optimize our under and over forecasting costs leading to stock-outs at 3% and improved gross margins. We are now expanding the model to other categories, iterating with additional related datasets, and adding newer data to Amazon Forecast to continuously improve the model accuracy. - Supratim Banerjee, Chief Transformation Officer - More Retail

Thanks to an innovating pricing strategy, Amazon Forecast is currently the fully managed forecasting service in the market with the lowest total cost of ownership (TCO). The key cost drivers are the number of hours used to train the ML model, the amount of data used to train the model, and the number of generated forecasts (i.e., one product sold at one retail location would equal to one forecast). The number of generated forecasts is by far the most significant driver of the total cost.

As an example, let's consider the case of a clothing company selling 2,000 items in 50 stores. Each combination of an item and store location would equate to one time series for a total of 100,000 time series to forecast (2,000 items x 50 stores). We'll also assume that the company will use a training dataset of about 5 GB for this task, that the company will be generating the default set of quantiles for each forecast (i.e. P=10, P=50, P=90), and take into account that a model using this amount of data would take about 20 hours to train. Finally, assuming that the company would want to generate a new set of updated rolling forecasts each week, the total weekly cost for this service would be equal to $185.24, and the total annual cost would be $9,632.48.

A modern cloud-base ML-powered time series forecasting service is a powerful yet affordable tool, and large and small companies should adopt one now to leverage the power of this technology to enhance decision-making and business performance.

Agile approach to adopting BI and ML

Once business executives decide to enhance their organization's ability to develop and use data insights to inform business decisions and drive continuous improvements in business performance, the next question is how to do it.

The recommended approach is first to take a few weeks to build a data lake (assuming one properly setup is not already available). Then, to start expanding the organization's ability to leverage BI and ML capabilities through an agile approach and a series of two-weeks long sprints focused on solving specific business problems. And finally, to develop an explicit data strategy articulating which additional data the company needs to generate better insights and drive improvements in decision making and business performance.

Building a data lake

To start leveraging business intelligence and data analytics tools a company needs to set up a data warehouse or a data lake to collect and store the required data. Historically, setting up and managing a data warehouse or a data lake involved a lot of manual, complicated, time-consuming, and expensive tasks. The work required includes loading data from diverse sources, monitoring those data flows, setting up schemas and partitions, turning on encryption and managing keys, defining transformation jobs and monitoring their operation, re-organizing data into a columnar format, configuring access control settings, deduplicating redundant data, matching linked records, granting access to data sets, and auditing access over time. Manually managing those tasks can take several months of work for the initial setup phase and many hours each following month for maintenance.

Data Lake

A modern cloud-based data lake is a cost-effective centralized and secure repository that enables a company to govern, discover, share, and analyze all its structured and unstructured data at any scale. Data lakes help companies to break down data silos and run flexible analytics and machine learning to guide better decisions.

Data Lake Key Steps

Creating a data lake with a modern automated system like AWS Lake Formation is as simple as defining data sources and what data access and security policies a company wants to apply, and the initial setup work is reduced from several months to just a few weeks.

AWS Lake Formation helps companies automatically collect and catalog data from databases and object storage, move the data into a new data lake, clean and classify the company data using machine learning algorithms, and secure access to the company sensitive data. After that, users can access a centralized data catalog which describes available data sets and their appropriate usage and leverage these data sets with their choice of analytics and machine learning services.

With Lake Formation companies can:

  • Build data lakes quickly.
  • Simplify security management.
  • Provide self-service access to data.

Expanding BI and ML capabilities

Often, the best way to learn a new skill is by practicing it, and this is especially true for developing new skills and capabilities across an entire business organization.

Instead of developing a master plan encompassing all possible use cases and benefits resulting from building new BI and ML capabilities, companies should adopt an agile approach and execute a series of two-weeks long sprints, each focusing on a specific business problem. Business organizations should tackle more manageable issues first and address more challenging issues once they have developed more confidence and familiarity with the new tools.

Initially, problems typically addressed include things like:

  • Understanding changes in the level and composition of revenues.
  • Monitoring the pipeline of new sales opportunities.
  • Helping all departments to track their expenses easily.
  • Creating customer's dashboard (i.e., dashboards helping account executive to monitor all interactions and transactions with a specific customer).
  • Monitoring employees presence, satisfaction, and churn.
  • Creating dashboards to track partners' performance (i.e., tracking on-time deliveries by logistic partners).

More advanced challenges addressed at a later stage might include issues like:

  • Optimizing marketing spending and e-commerce conversion rates.
  • Monitoring store traffic and heatmaps (i.e., data indicating where customers spend the most time inside a store) to optimize advertising spending and in-store merchandising.
  • Automating product quality inspections with computer vision.
  • Optimizing product demand planning and inventory management.

Developing an explicit data strategy

As business executives embark on this journey, they'll start to realize that some of the data required to enhance decision making and business performance are not available. They'll also discover that some of the available data are missing some critical attributes or are not structured correctly.

As the negative impact on decision making and continuous performance improvement resulting from missing data will become more and more apparent, the organization will start developing an explicit data strategy as part of its annual strategy and planning cycle articulating what activities and investments are needed to enhance its data assets.

A company with an explicit data strategy indicates an organization that has mastered how to leverage data insights to enhance decision-making and business performance continuously. - Paolo Timoni, Partner - Augeo Partners


Enhancing the ability to generate data insights to improve decision-making and business performance is an essential priority for most businesses. Modern BI and data analytics tools powered by the cloud and machine learning are affordable and easy to use. The time to start leveraging these powerful new technologies for both large and small companies is now and Augeo Partners can help.