Turn any Dataflow pipeline into a reusable template

Turn any Dataflow pipeline into a reusable template

09 February 2021

Turn_any_Dataflow_pipeline_into_a_reusable_template_00

Dataflow templates allow you to use the Google Cloud Console, the gcloud command-line tool, or REST API calls to set up your pipelines on Google Cloud and run them. Classic templates are staged as execution graphs on Cloud Storage while Flex Templates bundle the pipeline as a Docker file and stage these images on your project’s Container Registry. You may make use of one of the templates offered by Google or build your own.

Nowadays as data analysis grows within an organization, business teams need the ability to run batch and streaming jobs and leverage the code written by engineers. But re-running existing code also involves setting up a development environment and making minor code adjustments, which is difficult for people without a programming background.

To deal with these challenges, Google has introduced Dataflow Flex Templates, which make it even easier to turn any Dataflow pipeline into a reusable template that anyone can run.

Existing classic templates allow developers to share batch and stream Dataflow pipelines through templates so that without a development environment or writing code, anyone can run a pipeline. The new architecture of Flex Templates effectively removes all the limitations of classic templates, so we recommend using Flex Templates.

Flex Templates

Flex Templates bring more flexibility than classic templates by allowing minor variations of Dataflow jobs to be launched from a single template and allowing the use of any source or sink I/O. For classic templates, during the template development process, the execution graph is constructed.

Turn_any_Dataflow_pipeline_into_a_reusable_template_01

constructed. The Flex Templates execution graph is dynamically constructed based on the user-provided runtime parameters when the template is executed. This means that you can make small changes by using Flex Templates to accomplish various tasks with the same underlying template, such as modifying the format of the source or sink file.

Turn_any_Dataflow_pipeline_into_a_reusable_template_02

Creating Data Flow Flex Template

Google cloud have predefined production-quality templates that can be easily run from the Dataflow UI in Google Cloud Console. If you are new to templates, you can use these pre created templates. You can review the source code for the Google-provided templates and also review the examples for generating random data, decompressing data in Cloud Storage, analyzing tweets, or doing data enrichment tasks like obfuscating data before writing it to BigQuery.

Once you’re ready to share the Flex Template with users, the Google Cloud Console UI provides an option to select a Custom Template and then asks for a Cloud Storage path of its location:

Turn_any_Dataflow_pipeline_into_a_reusable_template_03

search
Blog Categories
Request a quote