XML stand for Extensible Markup Language which is easy to read by human and machine both, it is saved with.xml extension and have markup symbols to describe its file contents like HTML.

talend

XML file should be well structured and have proper opening and closing tags, it is considered as a kind of database in itself. It always start with <?xml version=”1.0″ encoding=”UTF-8″?> which contains its version and the encoding, changing the encoding will let XML to treat special character differently.

JSON stand for JavaScript Object Notation, it is language independent data format and used in exchanging data between a browser and a server. It is text based representation of structured data which is based on key-value pairs. We can convert any JSON into JavaScript and vice-verse.

ETL---XML-anETL - XML and JSON Files Processing with Talend Open Studiod-JSON-Files-Processing-with-Talend-Open-Studio

Note: Before reading any file make sure it is not password protected.

I am reading below file              

ETL - XML and JSON Files Processing with Talend Open Studio

  1. tFileInputXML

tFileInputXML component Reads an XML structured file row by row to split them up into fields and sends the fields as defined in the schema to the next component.

ETL - XML and JSON Files Processing with Talend Open Studio

tFileInputXML component has a few basic properties that needs to be check/uncheck to process data for proper formatting.

In ‘Edit Schema’ we need add one column with type, ‘Document’. Then in ‘Loop Xpath query option we need provide tags within XML file, e.g “/”, a simple backslash means file will be read from beginning to end or we can also provide “/root/value” now under ‘mapping’ in “XPath query” we can provide similar “/” node value to fetch values of all tags.

  1. tXMLMap

TXMLMap is similar to tMAP component, it is an advanced component fine-tuned for transforming and routing XML data flow (data of the Document type), especially when processing numerous XML data sources, with or without flat data to be joined.

ETL - XML and JSON Files Processing with Talend Open Studio

In tMap component if we already have XML file, we can import it by right click on doc and select ‘import from XML file’ the schema will be automatically created. In this we have to set loop element, in the above image loop element is ‘value’, so iteration will happen based on ‘value’ tag.

  1. tAdvancedFileXMLOutput

tAdvancedFileOutputXML outputs data to an XML type of file and offers an interface to deal with loop and group by elements if needed.

ETL - XML and JSON Files Processing with Talend Open Studio

tAdvancedFileOutputXML can be used in place of tXMLMap. In above image ‘entidad column is set as loop element, so iteration will happen on this tag. ‘@id’ is called attribute which means it is sub-element of entidad and we can’t add sub-element under it whereas ‘direction is also sub-element of entidad but we can add sub-element under it as we can see in above image.

  1. tFileInputJSON

tFileInputJSON Extracts JSON data from a file and transfers the data to a file, a database table, etc.

ETL - XML and JSON Files Processing with Talend Open Studio

JSON stand for ‘JavaScript Object Notation’ is a lightweight data-interchange format and It is based on the JavaScript programming language.

ETL - XML and JSON Files Processing with Talend Open Studio

Edit schema’ will contain all columns. ‘Read By’ will have 3 options out of which we are taking ‘JsonPath’. We can check ‘Use Url’ if Json file need to be fetched from any website else keep it uncheck. ‘Loop Json query’ is appearing because we have selected ‘JsonPath’ in ‘Read By’ property above, it will have path of tabs in file, please see Json file before this.

In the ‘book’ tag we have 4 attributes which needs to be extracted.

  1. tFileOutputJSON

tFileOutputJSON receives data and rewrites it in a JSON structured data block in an output file.

ETL - XML and JSON Files Processing with Talend Open Studio

Below is the file format that we are going to convert into JSON file.

ETL - XML and JSON Files Processing with Talend Open Studio

Name of data block’ is what comes in JSON at top, see below image.

Edit schema will have all column that need to be mapped.

Output JSON file:

ETL - XML and JSON Files Processing with Talend Open Studio

While working on Talend if in case we came across some issue which is not possible to resolve at our end we can raise it to Talend community on this link. Their team will help in solving the problem.

About Girikon: 

Girikon is an IT service organization, headquartered in Phoenix, Arizona with presence across India and Australia. We provide cutting-edge Salesforce consulting services and solutions to help your business grow and achieve sustainable success. 

 

About Author
Ravi Prakash Yadav
Ravi is a software developer with 8 years of experience in Talend, TAC, JAVA, SQL, MS Azure, AWS Redshift/S3 and Salesforce CRM. He has hands-on experience in data warehousing and data modeling. He also has experience in data normalization and standardization along with analysis/processing.
Share this post on: