Placeholder canvas

Models Will Run the World, But Hold the Data


Optimizing product design and complex processes depends on the ability to answer the “what if” question. We need to predict what will happen if we change the geometry of our product or the setpoints on our process. We generally have two options to predict the future in this way – simulation and machine learning.

Simulation is the highest fidelity (i.e., match to reality) that we have. While simulation technology is highly valuable, its impact is limited because of three major challenges:

  • Time and expense – Simulations often involve long solve times and lots of compute. In some cases, weeks on 1,000’s of CPUs – all for a single data point.
  • Reliability – With most simulation approaches, a solution isn’t guaranteed. Solutions can diverge and the reason is not always obvious.
  • Accuracy – With most simulations it’s extremely difficult to know how accurate a given solution is.

If you were able to overcome these challenges, you’d be able to go beyond asking the question, “will this design work” and be able to ask, “what is the configuration, or settings to achieve optimal performance.”  It would be like going from finding a good answer, to finding the best answer. It would also drive the primary application out of designing systems to controlling their operation.

The alternative to simulation is machine learning applied to large data sets. While also impactful, this technology has drawbacks:

  • Because the technology is based on identifying trends in data, it needs data – lots, and lots of data. Data that isn’t always available and when it is, can be difficult to manage.
  • Models created this way are difficult to update. If something changes, you often must recreate the entire model.
  • Unlike with simulations which are based on some sort of core principle, it can be extremely difficult to explain answers coming from a data-centric approach
  • And just like simulation, model accuracy is difficult to quantify.

Perhaps the biggest challenge to this approach is the outcome. By deploying ML on large data sets you can only identify a solution that is as good as the best you have ever done. There is no way with these techniques to easily identify better, let alone optimal solutions.

We covered this and more in our recent panel where I was joined by Schlumberger’s Machine Learning Lead Jose Celeya and University of Michigan Aerospace Engineering Professor Alex Gorodetsky. Watch the panel for a deeper discussion on the issues above and the new techniques that help overcome current limitations.