“The fact that ETL is largely powered by open source is interesting for several reasons:
- First, open-source projects are driven by developers from a large number of diverse organizations.
- Second, one of the most important features of ETL platforms is the ability to connect to a range of data platforms.
- Third, and perhaps most important, the fact that these engines are open source (free) removes barriers to innovation. ” -data-informed.com
Welcome to our class of Talend from the Roots!
In the previous post, we’ve learnt about metadatas and schemas and made a custom job that imported data from a csv file, made a custom schema and exported the data to tlogrow component
If you haven’t checked that out yet, click here.
Now we are up against the task of propagating schema and pick up little tips along the way.
On board, sailor !
Sometimes, schemas need a change while in development and addition, removal and re-ordering columns can become a daunting task, especially if schema is used in different jobs.
However, Talend provides a solution by propagating the changes to all jobs if a shared schema is changed. (Hint : Store schema in metadata!)
- Let’s look at a custom job as follows. –
In this job, we already have a generic schema created in the metadata pallete with columns – name, date of birth and timestamp. (Spoiler :- We’ll learn to create it in the next session ).
Now-
- Open the generic schema.
- Add a new column “age” as shown.
- Click yes on the popup box as it appears.
- You’ll see that tFileOutputDelimited Shows an error
- Open the tFileOutputDelimited, and click the Edit Schema button to open the schema and select the View Schema option.
- As you can see in the following screenshot, the table on the left-hand side is different from that on the right-hand side.
- Click
to copy the right-hand schema into the left-hand panel.
- Finally press ok to propagate changes.
What actually happens
When a schema is updated for an output component, change is not propagated upstream. << allows this and ensures that the link to the General schema is maintained.
(Alter route- We could have also made the change in the previous tMap output, but it would cause the output schema to be Built-in, which we do not want!)
Conclusion
So we are done with our simple job that helped us understand Schema uses in Talend a little bit more. However, the task doesn’t appear onerous and leaned towards an easy slope, but it revers a much-needed importance to the fact that nuances are the most integral part of complexity and advancement.
Stay tuned, as we approach towards bigger and better jobs in our studio. In the next session, we’ll create a generic schema from metadata as well as lists.
Till then,
Keep Calm and Carry On!
Also, if you are an advanced ETL developer check out our post about Talend Curved Lines to learn a pro tip!
What do you think about our post? Do you have any ideas or queries?
Let us know in the comments.