How Can We Help?

Predict First-Time Homebuyers

In this use case example, we will be walking through how to predict an attribute of a user using a type of Machine Learning Pipeline called Look Alikes. Specifically, we’ll cover how to predict each user’s probability, on a scale from 0-100%, of being a first-time homebuyer based on a list of users who are currently known first-time homebuyer.

What data do I need for this prediction?

Predictions from a Look Alike pipeline answer the question: how likely is each user to belong to a certain group based on a list of positive labels? These positive labels indicate that the User ID does exhibit the trait we are predicting, in this case that they are first-time homebuyers during a specific period. Cortex will then analyze the behaviors of this specific group of users, and use that information to predict the first-time homebuyer likelihood for every other user.

While the list of User IDs and Labels is the only information required when setting up our Look Alike pipeline, additional information about these users is needed in order to make accurate predictions. This information is used to build features for our pipeline:

  • User Behaviors (e.g. purchases, logins, clicks, pageviews, adds to cart, etc.) with additional metadata (e.g. What device is the user on? Where was the user referred from? etc.)
  • User Attributes (e.g. demographics, loyalty status, etc.)

How do I predict the Likelihood each User is a First-Time Homebuyer?

Step 1​ : Choose Pipeline Type

Select ‘Create New Pipeline’ from your Cortex account, and choose the Look Alikes pipeline type.

Step 2: Upload Sets

This is where we upload our list of User IDs representing known first-time homebuyers. This file should be a .csv or or .csv.gz file, consisting of two columns: user_id and value. Because we only have Positive labels, every value will be 1.

id label
abc123 1
xyz987 1

Step 3 Define Dates

Traits can change over time, and in our example first-time homebuyers could switch to being non first-time homebuyers. Therefore it is important to specify the date range in which it was known that these users were first-time homebuyers, otherwise these users’ first-time homebuyer status may have been different during the training window and thus would lead to less accurate predictions.

In this example we are choosing the default range, but this can be any date range in which the first-time homebuyer status of the User IDs was known.

Step 4: Specify Settings

In this step, we will name our Pipeline ‘First-Time Homebuyer Look Alike Prediction’, have it rerun weekly on Sundays, and tag the pipeline with the ‘first-time homebuyer’ tag. Setting a weekly schedule means that your pipeline will use the latest available data to re-generate up-to-date predictions on a weekly basis.

Step 5: Review

The final step is to review your pipeline and ensure all settings look accurate! If anything needs updated, simply go ‘Back’ in the workflow and update any step. Otherwise, click ‘Start Training’ and sit back while Cortex generates the predictions.

Step 6: Update Labels Over Time (Optional)

If you’re collecting new User IDs of known first-time homebuyers over time, you can import these ​extra labels into Cortex so that your pipelines are always learning from the most recent information. To upload new labels, hit the “Edit” button on your pipeline (next to “Export Predictions”).

Related Links

Still have questions? Reach out to support@mparticle.com for more info!

Table of Contents