A well known fact is that machine learning requires data. And of course it’s true. Machine learning typically requires a large investment in data infrastructure, data pipelines, and data monitoring to ensure a continuous stream of high quality data for the machine algorithms. The other truism we see is that your data infrastructure is rarely, if ever, in a finished state. There are constantly new sources of information to add to your data warehouse or CDP. This ongoing data investment can be an expensive proposition with no ROI visible at the end of the tunnel.
So businesses are left in the following predicament: machine learning can be one of the best ways to drive value from your data with ongoing learning and the ability to automate tasks based on data. A catch-22 emerges given that machine learning can be a great path to data value and the fact that machine learning requires data to provide that value.
But, there are approaches which allow a business to demonstrate immediate value from machine learning, while at the same time fueling investments in additional infrastructure. Let’s briefly explore one of those approaches using the Decisioning SDK for next-best-action experiences.
The Decisioning SDK enables a business to use both real-time and historical data in order to make real-time decisions for use cases like next-best-action and next-best-offer. The key here is that the SDK works even when only real-time data is available (Cortex will actually convert the real-time data provided to the SDK into historical data on an ongoing basis). The Decisioning SDK takes onsite behavioral data like clicks, shown events, content consumed, conversions, etc and featurizes those behaviors and the associated metadata of those behaviors in real-time. These features are then used to both train a model and make real-time decisions for next-best-action type experiences. After 1-2 weeks of data collection by the SDK (variable based on the amount of data provided) it can be ready to make machine learning decisions.
- Here’s a blog post which covers real-time decisioning in more detail
- Here’s a video which shows how to build real-time pipelines in Cortex
Layering on Historical Data
So what advantage does historical data provide? The Decisioning SDK will show value when only real-time data is used. But the features used to train the model and do machine learning inference will be limited to the behavioral data provided. Layering on historical data to the Decisioning SDK allows a business to increase the types of information used for learning and inference. For instance, a business might add attributes associated with users like demographic information. Additional data, as long as it is accurate, won’t hurt the machine learning algorithms and can only increase performance.
Showing Value Quickly with No Data Infrastructure
Circling back the catch-22 we outlined above. A solution to showing value quickly from data investments might be to start with sending real-time behavioral data into the Decisioning SDK, demonstrate value, and then, over time, augment the real-time behavioral data with historical data from a data warehouse or CDP. More historical data will only improve the performance of machine learning decisioning.
Please reach out to firstname.lastname@example.org if you have any questions!