Machine learning as a service is a hot topic. It’s no accident the leading internet companies are racing to improve their machine learning cloud offerings. These include businesses such as Google’s Prediction API, Amazon’s Machine Learning service, and Microsoft’s Azure Machine Learning. Their appeal is obvious: outsource some of the expensive cost of building and maintaining a machine learning system, which requires difficult to find experts in machine learning, data science, and distributed systems. But in reality, there’s no free lunch, and developers have to make tradeoffs with out-of-the-box solutions.

Building Scalable, Distributed and Always- Learning Infrastructure

As a Senior Software Engineer in Machine Learning at Vidora, I think a lot about how we can build a scalable, distributed, always-learning infrastructure. Some of the items our developers think deeply about include:

  • Business focus:
    • Where and how can we improve engagement with our user experience by applying machine learning?
    • Can we use high level insights across many sites to improve engagement and personalization?
    • Beyond just predicting simple user preferences, how can we build an optimized model that improves business goals?
  • Infrastructure:
    • How do we collect data and store it in a format that the machine learning system understands?
    • How do we usefully incorporate different types of data available in a cohesive way with the model?
    • Machine learning models can take hours to compute. How do we integrate the system in such a way that we minimize disruptions to the user experience?
  • Algorithms:
    • Are the modeling assumptions validly incorporating the linear or nonlinear nature of people’s preferences with the site? Are there multicollinear or correlated variables negatively affecting my model’s prediction performance?
    • How can we have the models update quickly as a visitor interacts with the layouts and content that make up the user experience?
    • What is the best data to use for the learning model?

As we ponder these questions, we continuously improve our algorithms and data to better incorporate consumer preferences. We work with lots of different sites, where we encounter variations of similar problems. These help us get smarter and faster, and ultimately allow us to bring those insights back to our machine learning framework. Our developers have found big boosts in performance incorporating algorithms and data that understand user preferences.

Hands-off Cloud Offerings

Cloud offerings are great when they are hands off. Ideally, this means that the services don’t require much domain knowledge to get working with satisfactory results. This is true in at least two situations:

  • Problems that can be solved with simple models with strong theoretical underpinnings are good candidates for machine learning cloud-based solutions. While theory is lagging behind practice with much of Machine Learning, there are theoretical underpinnings of the simple models implemented in cloud services like linear regression and random forests which can allow these techniques to be low touch and produce a reasonable solution without requiring tons of data. In practice these models often require tweaks to improve your data so your model can perform better. This is called “feature engineering” which can take a significant amount of work understanding, collecting, and transforming your input which may not be suitable for your business purposes.
  • Problems that can be solved with complex models that generalize to the problem you are solving are also good candidates for machine learning cloud-based solutions. These are generally “sensory” tasks like object recognition in images, sentiment analysis, and voice recognition which companies like Microsoft, Google and Apple continue to solve.

In reality though, these out-of-the-box solutions may not be the best solution for you. For example, you could have a complex model for sentiment analysis trained on documents and apply this to tweets. In practice, this does not work very well, so the context matters.

What Cloud Services Have to Offer

It’s no surprise that cloud services provide both of these solutions. They provide ways to learn simple and complex models that are already trained to your problem, such as sentiment analysis. Companies like Clarifai and others like it only provide a pretrained model. However, companies like Microsoft Azure give you both the tools to create your own simple model or to use other people’s complex models in an app-store like experience.

Someone using these cloud services should still be comfortable using programming and math concepts. For instance, I wouldn’t suggest having developers with no familiarity of Machine Learning concepts use these services to build even a simple model from scratch. They need to be familiar with the concepts of separating data into training and test sets, bias-variance tradeoff, modeling assumptions, and error metrics. Even better would be if they could do feature engineering, removing outliers that would skew the modeling fit, and know what error would be satisfactory to accomplish your business goal. Without this knowledge, blindly applying these services will have many pitfalls – and that will prevent good results.

Finding the Right ML Solution for You

I believe the type of machine learning solution you’ll use (cloud-service, 3rd party, or home-built) will depend mostly on what you and your developers are trying to achieve, and on the complexity of your goal:

  • Is your problem well-defined with simple solutions already built?
  • Or is your problem more ambiguous and needs sophisticated fine-tuning to optimize?
    • See if there is an outsourced solution that can help solve your problem. But if you have more resources, maybe building something yourself is the right answer. Vidora helps companies build optimal consumer experiences without any engineering work or machine learning knowledge needed.

Overall, cloud based machine learning systems like the ones Amazon, Microsoft and Google are building are very exciting.

What each company’s developers ultimately choose depends on their unique situation. However, there’s no doubt that this is just the start of what’s to come in machine learning and artificial intelligence. It’s going to be a fun adventure!

Emmett McQuinn (@EmmettStream) has a passion for machine learning, high performance computing, and interactive real-time visualization technology. He has previously worked at several startups and research environments building high performance real-time visualization and analysis tools at IBM Research, Quid, Stanford, San Diego Supercomputer Center, and Sandia National Laboratories.

Want to Learn More?

Schedule a demo and talk to a product specialist about how Vidora’s machine learning pipelines can speed up your ML deployment and ultimately save you money.