How Can Fuzzy Matching Using TALEND Help Deal With Data Duplicacy?

June 4, 2020
Ayush Joshi

Are you dealing with duplicate data?

Does your data not fall under exact match?

Are the duplicates in your data not consistent for an exact match?

Are you struggling with cleansing of different types of data duplicates?

If you have answered yes to most or all of the aforementioned questions then the solution to your problem is Fuzzy Matching. Fuzzy matching allows you to deal with the above mentioned problems easily and efficiently.

What is Data Matching?

Data Matching is the process of discovering records that refer to the same data set. When records come from multiple data sets and do not have any common key identifier, we can use data matching techniques to detect duplicate records within a single dataset.

We perform the following steps:

Standardize the dataset
Pick unique and standard attributes
Break dataset into similar sized blocks
Match and Assigning weights to the matches
Add it all up — get a TOTAL weight

What is Fuzzy matching?

Fuzzy matching allows you to identify non-exact matches of your dataset. It is the foundation of many search engine frameworks and it helps you get relevant search results even if you have a typo in your query or a different verbal tense.

There are many algorithms that can be used for fuzzy searching on text, but virtually all search engine frameworks (including bleve) use primarily the Levenshtein Distance for fuzzy string matching:

Levenshtein Distance: Also known as Edit Distance, it is the number of transformations (deletions, insertions, or substitutions) required to transform a source string into the target one. For example, if the target term is “book” and the source is “back”, you will need to change the first “o” to “a” and the second “o” to “c”, which will give us a Levenshtein Distance of 2.

Additionally, some frameworks also support the Damerau-Levenshtein distance:

Damerau-Levenshtein distance: It is an extension to Levenshtein Distance, allowing one extra operation: Transposition of two adjacent characters:

Ex: TSAR to STAR

Damerau-Levenshtein distance = 1 (Switching S and T positions cost only one operation)

Levenshtein distance = 2 (Replace S by T and T by S)

How to Use Fuzzy Matching in TALEND?

Step 1: Create an Excel “Sample Data” with 2 columns “Demo Event 1” and “Demo Event 2”.

Demo Event 1: This column contains the records on which we need to apply Fuzzy Logic.
Demo Event 2: This column contains the records that need to be compared with the Column 1 for Fuzzy match.

Step 2: In TALEND use the above Excel as input in the tfileInputExcel component and provide the same file again as input to the same component as shown in the diagram.

Step 3: In the tFuzzyMAtch component choose the following configurations as shown in the below diagram.

Step 4: In the tMap we need to choose the following column to take an output.

Demo_Events_1
MATCHING
VALUE

Step 5: Finally, you need to select an tFileOutputExcel component for the desired output.

In the final Extracted file, the Column “VALUE” shows the difference between the records and matches the records to their duplicate.

Conclusion:

In a nutshell, we can say that the use of TALEND’s Fuzzy Matching helps in ensuring the data quality of any source data against a reference data source by identifying and removing any kind of duplicity created from inconsistent data. This technique is also useful for complex data matching and data duplicate analysis.

About Girikon

Girikon is a reputed provider of high-quality IT services including but not limited to Salesforce consulting, Salesforce implementation and Salesforce support.

About Author

Ayush Joshi

Ayush is a Salesforce consultant and Talend Developer with expertise in Data Analysis, Data Migration, and Salesforce Administration jobs. He loves to shares his insights by blogging around ‘Data analysis and various migration techniques’.

Share this post on:

Best Salesforce Consulting Partner...PERIOD

For the past 14 years, I've worked with numerous Salesforce Consulting partners, from very small, boutique style to very big traditional partners (big 5). Girikon has been by far the best partner I've worked so far. They are very receptive, responsive, highly knowledgeable, technical and very quick to address issues to find agreeable solutions. Their cost and value has no comparison in the industry. I highly recommend them and give them a 5 star.

Director, Informa PLC, USA

Great Salesforce Consultancy

Girikon has been instrumental in the success of our Salesforce roll-out at Refuse Specialists. They are efficient and very accurate in their time estimates for both large and small custom dev projects. They are very good at translating my ideas into actionable work and consistently deliver quick fixes.

VP of IT at Refuse Specialists LLC, USA

Excellent Project / Product Customizations

Great company to work with. Girikon has been able to take the base Sales Force product and customize it to meet our needs. We have thrown them some significant change requests since the initial concept and they have been able to adapt while offering alternate ideas for ways to improve. I would recommend for future projects!

Plant Manager, ITW Deltar Fasteners, USA

Excellent Service and Support

Incredible company that has solved our previous problems to the best of their abilities. Always available and super quick with their responses. Highly recommended!

CEO and Co-Founder, GoGo World, Japan

We look forward to further strengthening our partnership with Girikon.

Our engagement with Girikon has been a great learning experience for me, as an individual; as well as added to our company’s growth and experience. I now, so confidently know that when I hand off a new project to the Girikon team, it is not just going to be done on time, but also done well!

CEO, GNGF, USA

Great company, great people, great results

Very nice people who do good work for a great price. Very responsive, would recommend!

Operations Manager, Pure Power Engineering, USA

Go! Go! World is absolutely looking forward to working with Girikon for future projects.

Girikon analysed, planned and executed our needs perfectly. They took the time to learn our business and our needs specifically. Girikon’s work ethic and understanding of our needs were top-notch.

COO, GoGo World, Japan

We wanted to partner with someone who could provide affordable solutions and talent pool of highly skilled resources. In our partnership with Girikon, we got both! Our association has been growing stronger over the years. No matter what technologies will be involved, our next project will certainly be assigned to the Girikon team.

CEO, Sierra Proto Express, USA

We have been extremely impressed with all of our projects with Girikon. Girikon has helped streamline our internal processes, and made our daily work environment a better place to be. Our work together has reduced the amount of tedious “plug-and-chug” work, helped us to innovate new processes, and allowed us to focus on helping our clients.

IDS, GNGF, USA

We evaluated more than five different companies and chose Girikon as our technology partner. What I appreciate most about their services is that they always deliver with quality. Girikon provided our company end-to-end solution for e-commerce.

Founder Director (Yebhi.com)

I wanted to express my appreciation for the outstanding service received from Girikon. We have had our software for about 7 years now. We still receive the same dedicated service as we did from day one. Very few companies today offer this ongoing commitment to customer satisfaction. I highly recommend Girikon as a development and IT partner!

Jt. Dy. Director General, FIEO