How Can We Help?

Feature Engineering

Cortex automatically connects every step of the Machine Learning process into end-to-end Machine Learning Pipelines that anyone in your business can run. In this guide, we’ll discuss the third step of a Cortex pipeline: feature engineering.

What is feature engineering?

Feature engineering is the process of transforming raw data into features that your pipeline will use to learn. A feature is simply a way to quantify something about your objects (e.g. users). For example –

  • How many total clicks has each user recorded over the last 7 days?
  • What percent of each user’s sessions over the last 30 days have included a transaction?
  • How many different devices has each user used to log in over the last 14 days?
  • Etc.

There are limitless features that could be built from a stream of raw event data. The task of Cortex’s feature engineering step is to identify and build the ones that will be most predictive of the goal that your pipeline is optimizing for (e.g. probability of purchasing in the future).

Learn more about feature engineering in this blog post.

Why does it matter?

Features are the tools that your pipeline uses to learn and make predictions. Your pipeline will learn to emphasize important features, and to ignore irrelevant ones. But even with the most sophisticated algorithms, a pipeline is often only as good as its features are predictive. If the inputs aren’t relevant indicators of what you’re looking to predict, your pipeline won’t find any patterns that are useful for making predictions.

“At the end of the day, some machine learning projects succeed and some fail.  What makes the difference? Easily the most important factor is the features used.”

– Professor Pedro Domingos

Which feature engineering techniques does Cortex use?

Cortex uses a variety of feature engineering techniques to ensure that your pipeline is training on the most predictive inputs possible.

Most of your pipeline’s features are engineered by combining various functions, filters, windows, and transformations. Some examples of each are listed below.

Functions Filters Windows Transformations
Sum
Average
Count
Mode
Count unique
Elapsed time
Temporal incidence
Sequence
Column filters
Value filters
Inequality filters
Chained AND/ORs
1-day
3-day
7-day
14-day
28-day
56-day
84-day
Log
Delta
Percent
Exponentiation

Below are a few examples of how features might be built in this way. Note that which features get generated depends on the type of data being ingested into your Cortex account.

Feature Function Filters Window Transformation
Total shoes purchases over the last 56 days Count event_type = purchase
category = shoes
Days = 56 N/A
Unique categories clicked over the last 28 days Count unique event_type = click Days = 28 N/A
Total amount spent over the last 7 days compared to the previous 7 days Sum (price) event_type = purchase Days = 17 Delta

This allows Cortex to build a wide variety of features that speak to behavioral patterns such as frequency, recency, breadth, sequences,  and more. See below for examples of each of these.

Frequency

  • Total number of purchase events on category “shoes” over the last 7 days
  • Total price across all the user’s purchase events over the last 14 days
  • Number of days over the last 28 where the user registered at least 1 click event

Recency

  • Number of days since the user last recorded a pageview event
  • Number of days since the user last purchased an item for > $50

Breadth

  • Number of unique event types the user has completed over the last 7 days
  • Number of unique categories the user has added to cart over the last 56 days
  • Average price of items purchased by the user over the past 84 days

Sequences

  • Whether the user has completed the event sequence “click on email” → “click to site” → “purchase on site” within the last 14 days

In addition to Cortex’s automated feature engineering techniques, you may also define custom features based on your business intuition, and easily add these features to any of your Cortex ML pipelines.

Related Links

Still have questions? Reach out to support@mparticle.com for more info!

Table of Contents