Predict High vs Low Value Users
In this use case example, we will be walking through how to predict an attribute of a user using a type of Machine Learning Pipeline called Classification. Specifically, we’ll cover how to predict each user’s probability, on a scale from 0-100%, of being a High Value Users based on a list of users who are currently known to be High Value and other users known to be Low Value.
What data do I need for this prediction?
Predictions from a Classification pipeline answer the question: how likely is each user to belong to a certain group based on a list of positive labels and another list of negative labels? These labels indicate a True or False value representing if the User ID exhibits the trait we are predicting; in this case True if they are a High Value user during a specific period and False if they are Low Value. Cortex will then analyze the behaviors of these specific groups, and use that information to predict the likelihood every user will be High or Low Value.
While the list of User IDs and Labels is the only information required when setting up our Classification pipeline, additional information about these users is needed in order to make accurate predictions. This information is used to build features for our pipeline:
- User Behaviors (e.g. purchases, logins, clicks, pageviews, adds to cart, etc.) with additional metadata (e.g. What device is the user on? Where was the user referred from? etc.)
- User Attributes (e.g. demographics, loyalty status, etc.)
How do I predict the Likelihood each User is High Value?
Step 1: Choose Pipeline Type
Select ‘Create New Pipeline’ from your Cortex account, and choose the Classification pipeline type.
Step 2: Upload Sets
This is where we upload our list of User IDs representing known High and Low Value users. This file should be a .csv or or .csv.gz file, consisting of two columns: user_id and value. The value of 1 will represent positive labels of High Value, and the value of 0 will represent negative labels, or in this case Low Value.
Step 3: Define Dates
Traits can change over time, and in our example High Value users may not always continue being High Value. Therefore it is important to specify the date range in which it was known that these users were High Value, otherwise these users’ status may have been different during the training window and thus would lead to less accurate predictions.
In this example we are choosing the default range, but this can be any date range in which the status of the User IDs was known.
Step 4: Specify Settings
In this step, we will name our Pipeline ‘High vs Low Value Users Prediction’, have it rerun weekly on Sundays. Setting a weekly schedule means that your pipeline will use the latest available data to re-generate up-to-date predictions on a weekly basis.
Step 5: Review
The final step is to review your pipeline and ensure all settings look accurate! If anything needs updated, simply go ‘Back’ in the workflow and update any step. Otherwise, click ‘Start Training’ and sit back while Cortex generates the predictions.
Step 6: Update Labels Over Time (Optional)
If you’re collecting new User IDs of known High and Low Value users over time, you can import these extra labels into Cortex so that your pipelines are always learning from the most recent information. To upload new labels, hit the “Edit” button on your pipeline (next to “Export Predictions”).
- Classification Performance
- How to Build a Look Alike Pipeline
- How to Build a Regression Pipeline
- How to Build a Future Events Pipeline
Still have questions? Reach out to firstname.lastname@example.org for more info!