Tune into part one of four, where we talk about how companies are using BigQuery with their machine learning efforts and more importantly, the support in GCP for machine learning and BigQuery.

Meet the Speakers

Jared Burns

Data Science Engineer at Agosto

Mark Brose

Vice President of Engineering at Agosto

Transcript

– So if you’re not aware, BigQuery is GCP’s enterprise data warehouse for- for analytics. It has a ton of support of the plant libraries from Python, Node, Java, other things that can handle real time and bash data. And more importantly for our talk today is there’s a lot of support in GCP for machine learning and BigQuery.

– So today we’re going to talk about how companies are using BigQuery and their machine learning efforts.

– So, Jared, are companies using BigQuery for machine learning, and why is it useful?

– Yeah so I think we’re seeing quite a few people- companies using BigQuery for machine learning. So if you’re not aware of BigQuery, it’s GCP Enterprise Data Warehouse for Analytics. It has a ton of support of the plant libraries from Python, Node, Java, other things that can handle real time and bash data. And more importantly for our talk today is there is a lot of support in GCP for machine learning and BigQuery.

– Yeah, so I know there’s like BigQuery can do a lot of things kinda like you’re talking about there. So specifically with ML, with machine learning, I guess I’ve kinda learned there’s a lot of different ways that you can go at it. So maybe talk a little bit about different approaches to using BigQuery for ML and like why you might pick one of the different options.

– Yeah, so really from my perspective, and having worked with clients before, it really comes down to where the organization fits, in terms of their comfort- their level of comfortability when using these tools. So let’s say that you’re an organization that has a lot of data scientists that do a lot of machine learning intensive workloads on their local computers and you started using BigQuery. An approach for that type of organization might be to utilize AI Platform Notebook with enter flow and pretty much using your same model frameworks, your same modeling code that you used before, except now using it to do leverage GCP- GCP’s BigQuery data.

– In that case-

– Yeah

– BigQuery is primarily like a data source, and that’s it? Or how do you think about it in that scenario?

– Yeah, BigQuery is your data source, it can also be your model repository. You can also, with implemented models, you can train your models on an AI platform, you can save your models to a Google Cloud Storage Bucket and then you can upload that safe model object to BigQuery. And then the powerfulness of that is you can then serve your model using data that is actually in BigQuery. So you don’t have to move data out of BigQuery to serve predictions, your data- your model is saved right there in the data warehouse, and you can serve your model in real time.

– That’s cool! So your able to leverage like standard kinda ML tooling and libraries, but still take advantage of kinda that BigQuery platform for serving up that model. Okay, so that’s good. So what, so now that’s your expert data science people that know all the tools in the libraries, are there some options for people that are maybe a little- understand data well but you know maybe aren’t as sophisticated in data science background that can take advantage of BigQuery ML?

– Yeah, we see most of our attraction in this area because when you see clients that come in that they are very heavy sequel users, they know a lot about data warehouse tools and methodology, but they’re not so expert in more of the open source framework that exists like TetraFlow. They want to be able to utilize some of these same tools and methodology in BigQuery, and that’s where BigQuery ML comes in. So unlike other data warehouses out there, BigQuery supports in-data warehouse machine learning. So there’s support for things like classification, linear regression, matrix factorization with the use for product recommendation and you can even import your own TetraFlow models as well. There’s a rich tool set there available to you right in BigQuery.

– Cool. Okay, so that’s a good like, pretty sophisticated but maybe- maybe still a little hard for some people to take advantage of. So is there, are there even some more automated things that are there in BigQuery? Like, you know, I’ve heard about these auto ML tables, where we’re maybe leveraging a little bit more of the true data science. People at Google that have helped us out, getting, you know, making this even easier. Yeah, so maybe you can talk a little more about that, that option with BigQuery as well.

– Yeah, yeah, so for that example, let’s say that you’re maybe- you’re just an analyst and you’re not necessarily involved with equally- you’re not that kind of- not that technical. In that case, you might just want to be able to appoint a tool to a BigQuery table that is essentially ready for modeling, and be able to quickly run and try different models. That’s where auto ML tables comes in. It’s a point in click AI where you go in and there’s a series of steps where you essentially point your auto ML tables to a BigQuery table. You tell it which field is your field that you’re trying to predict and which fields are your features. And you tell it- you give it a budget of how many note hours you run for and it gives you a final model at the end. So it’s basically an almost fully automated in terms of having all those features for use. You still have to set up your data in BigQuery but as long as that data is set up for you, you can be pretty much a non- you can be pretty much a domain expert in an area and be able to run with it kinda thing.

– Cool. So yeah, a full kind of spectrum of options, that’s really cool, thanks.

– Yeah.