“The value of metadata lies in its ability to more efficiently classify and organize information, as well as to yield deeper insight into the actions taking place across your business, providing more intelligence and higher quality information to fuel big data initiatives, automation, compliance, data sharing, collaboration and more.” -M-files.com
Welcome to our class of Talend from the Roots!
In the previous post, we learnt about how with the help of an intuitive software you can transform your current ETL scenario and get a boost to your productivity and efficacy in the data management field.
If you haven’t checked that out yet, click here.
So, our installation is complete and our Talend Open Studio is up and running.
Let’s quickly brush up some basics before we proceed further in the realm of data manipulation and develop our first Talend Job!
In the world of ETL, data is everywhere. From schemas to databases to services each and every process is centered around data. There is so much information, that we need an understanding about to effectively catalogue its nature. A data about the data.
That’s what metadata is!
In most of our daily usages, the basic definition of metadata is, “it’s the data about data.” Metadata accurately describes the format and nature of the current data, its length, type, description. On an organizational level, it can represent the entire digital lifecycle of the business process, from its procedures, opportunities to providing a specific and precise audit trail than can prove invaluable for any firm.
In the development of basic to advanced Talend jobs, metadata is the most important aspect. The most common type of metadata used here is the Schema.
A schema defined the inputs and outputs of your job, how data is characterized and moved around.
In talend, we have various services to capture metadata from diverse data sources, databases, Excel worksheets, delimited files which can be stored in its built-in metadata repository.
We’ll deal with two types of schemas in talend: –
- Built in
Built in – In built in schemas, all data is stored directly in your job. You can manually edit and enter information here.
Repository – Information is stored in the repository. If you have a job that repetitively uses same schema or a schema is required in multiple jobs, prefer repository schemas.
It’s always a best practice to define source and target metadata using repository schema and mid-flow metadata as a Built-In Schema.
Now, we are ready for our first talend job.
As a part of building a solid foundation over talend work environment we’ll start from basics. Today we’ll just learn to take input from a csv file component, manually create a built-in schema and propagate it to a tlogrow component.
- We’ll first start by launching our talend open studio.
- Now click on create project, we’ll name it as test_1 (you can give any name you want).
- Now in our talend interface, we’ll click on the home button on the left and in the jobs column, we’ll right click and create a job and name it test_schema.
- Let’s take a sample csv file. Here we are using a csv file we three columns name, dob and time, with only one row of entry. Now to input this file data into out talend job, we’ll use tfileinputdelimited component and a tlogRow component to output data.
- Next, we’ll join these components such that your job looks like this
- Now double click on the tfileinputdelimited component and in the File name/stream tab navigate to the path of the csv file.
- Now select edit schema and add three components in the schema, name, dob and time as follows.
- Now run the job by clicking here.
- As seen below, you have successfully imported the data from the data and exported to a tlogrow.
Congrats, we’ve made our first talend job!
Today we’ve learnt about metadatas, schemas and made our first talend job and performed three functions
- importing data from a csv file,
- making a custom schema ,
- exporting this data into the tlogrow component.
Stay tuned, as more talend projects are coming along the way. In the next session, we’ll continue our progress in the schema realm and design a custom job to propagate our schema along the way.
Also, if you are an advanced ETL developer check out our post about Talend Routines to learn a pro tip!
What do you think about our post? Do you have any ideas or queries?
Let us know in the comments.