In the final part of our Machine Learning – Big Query discussion, we discuss how BigQuery can be used for custom machine learning models. We also dive into the process of importing Tensorflow models into BigQuery, how to use those models in real time and weighing out the advantages and disadvantages.

Meet the Speakers

Jared Burns

Data Science Engineer at Agosto

Mark Brose

Vice President of Engineering at Agosto

Transcript

– Yeah, I think specifically Tensorflow, Tensorflow is an end to end open source platform for machine learning to develop by Google and it’s supported by BigQuery. You can essentially import your Tensorflow models into BigQuery MO and use those, use those models in real time so we’ll spend a little bit of time today talking about how to do that, what that looks like and what are the advantages and disadvantages.

– Today we’re gonna talk a little bit about how BigQuery can be used for really custom machine learning models that you might develop in the Google ecosystem. Let’s talk a little bit first about maybe why you would want to go this route, right? There’s definitely a little more heavy lifting on building custom models in say Jupyter Notebooks or using Tensorflow libraries kind of yourself, like why, why would you take that approach as opposed to some of the more automated approaches that exist in BigQuery.

– Yeah so some of the more automated approaches while they are easier to use, they don’t provide you a lot of flexibility out of the box in order to really customize and fine tune your model. So if you’re working in a scenario where you really require very high accuracy models and or models that are really performant in training over GPUs or TPUs then those more automated solutions might not work for you. You might need a more customized solution and that’s where Tensorflow and working in AI platform, Jupyter Notebook, comes in.

– So this is kind of an environment where maybe you’ve been running something for a while that was built in a more automated fashion but you’re getting to the point where you need to squeeze a little bit more accuracy out of it or you’re like you’ve learned enough now where you feel like hey if I had a little bit more control here I can make a better model. Is that kind of one of the good use spaces where it would make sense.

– Absolutely, yeah, yeah. If you already have a baseline model and you’re ready to take the next step and want to explore maybe a neural network model this’ll be an approach to take. Or let’s say that you’re working with time series data. It’s very dense, you have a lot of different timestamps and we have a lot of sparse features or a lot of features that are, you know, don’t have a lot of variability in them then a wide and deep neural network might work well for you in that case where your more traditional machine learning algorithms, like linear regression, logistic regression might not be as well-suited for those types of problems.

– Let’s say we talk a little bit about what’s kind of a good process to do this kind of development leveraging the Google ecosystem. So I’ve heard about AI platform, common known, like Jupyter Notebooks you can use there. Would you typically then use that and then leverage that approach to kind of build something out where BigQuery’s kind of plugged in for you?

– Yeah, absolutely. So you can read data using Tensorflow’s reader. You can read it directly from the BigQuery underlying storage API, which is very fast. You can read it into Jupyter Notebook and then do, say you want to submit a training job to AI platform and you want to try various different eco-parameters, you can submit those training jobs and have it run in the background and then once it’s done, you have your final model saved that you can then upload to BigQuery and also then that way you can serve your model where the data resides within BigQuery rather than having to do this back and forth between BigQuery and AI platform, Jupyter Notebook.

– So when you do this approach so you can, like you mentioned there, you can deploy and run those models, most of those models, in BigQuery there, are there any limitations there for the type of models that you can develop that you can run in BigQuery or does it support pretty much everything?

– Yeah, you’re kind of restricted to, so Tensorflow is this ecosystem of rich models that you can do things like image classification, natural language processing, a whole host of different really modeling types and scenarios but you’re really kind of limited, in this case, because we’re working with BigQuery data, it has to be tabular data. So it has to be, you know, something like time series data, or data that has a sense of kind of this BigQuery notion of a table that you’re working with.

– So are there also scenarios where you might develop models using AI platform and BigQuery where you’re like exploring those models and running those somewhere else? Is that also a thing, you don’t necessarily need to host them in BigQuery?

– Yeah, it’s definitely optional to host those models in BigQuery but if you’re, typically if your storage is BigQuery and you want to make those, you want to, your storage is probably most likely gonna be where you’re gonna put your model but there are scenarios where you can train that model in Jupyter Notebooks and then you can have that model, you can deploy that model to AI platform and then it’s available as an API to get predictions from either remote devices or anywhere.

– Thanks Jared.

– Thanks.