Talend – Schema Propagation

“The fact that ETL is largely powered by open source is interesting for several reasons:

  • First, open-source projects are driven by developers from a large number of diverse organizations.
  • Second, one of the most important features of ETL platforms is the ability to connect to a range of data platforms.
  • Third, and perhaps most important, the fact that these engines are open source (free) removes barriers to innovation.  ” -data-informed.com

Welcome to our class of Talend from the Roots!

In the previous post, we’ve learnt about metadatas and schemas and made a custom job that imported data from a csv file, made a custom schema and exported the data to tlogrow component

If you haven’t checked that out yet, click here.

Now we are up against the task of propagating schema and pick up little tips along the way.

On board, sailor !

Sometimes, schemas need a change while in development and addition, removal and re-ordering columns can become a daunting task, especially if schema is used in different jobs.

However, Talend provides a solution by propagating the changes to all jobs if a shared schema is changed. (Hint : Store schema in metadata!)

  • Let’s look at a custom job as follows. –

2

In this job, we already have a generic schema created in the metadata pallete with columns – name, date of birth and timestamp. (Spoiler :- We’ll learn to create it in the next session ).

Now-

  •  Open the generic schema.
  • Add a new column “age” as shown.

3

  • Click yes on the popup box as it appears.

4

  • You’ll see that tFileOutputDelimited Shows an error

5

  • Open the  tFileOutputDelimited, and click the Edit Schema button to open the schema and select the View Schema option.
  • As you can see in the following screenshot, the table on the left-hand side is different from that on the right-hand side.

7

  • Click   6 to copy the right-hand schema into the left-hand panel.
  • Finally press ok to propagate changes.

What actually happens

When a schema is updated for an output component, change is not propagated upstream. << allows this and ensures that the link to the General schema is maintained.

(Alter route- We could have also made the change in the previous tMap output, but it would cause the output schema to be Built-in, which we do not want!)

Conclusion

So we are done with our simple job that helped us understand Schema uses in Talend a little bit more. However, the task doesn’t appear onerous and leaned towards an easy slope, but it revers a much-needed importance to the fact that nuances are the most integral part of complexity and advancement.

Stay tuned, as we approach towards bigger and better jobs in our studio. In the next session, we’ll create a generic schema from metadata as well as lists.

Till then,

Keep Calm and Carry On!


Also, if you are an advanced ETL developer check out our post about Talend Curved Lines to learn a pro tip!

What do you think about our post? Do you have any ideas or queries?

Let us know in the comments.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s