Are you dealing with duplicate data?
Does your data not fall under exact match?
Are the duplicates in your data not consistent for an exact match?
Are you struggling with cleansing of different types of data duplicates?
If you have answered yes to most or all of the aforementioned questions then the solution to your problem is Fuzzy Matching. Fuzzy matching allows you to deal with the above mentioned problems easily and efficiently.
What is Data Matching?
Data Matching is the process of discovering records that refer to the same data set. When records come from multiple data sets and do not have any common key identifier, we can use data matching techniques to detect duplicate records within a single dataset.
We perform the following steps:
Standardize the dataset
Pick unique and standard attributes
Break dataset into similar sized blocks
Match and Assigning weights to the matches
Add it all up — get a TOTAL weight
What is Fuzzy matching?
Fuzzy matching allows you to identify non-exact matches of your dataset. It is the foundation of many search engine frameworks and it helps you get relevant search results even if you have a typo in your query or a different verbal tense.
There are many algorithms that can be used for fuzzy searching on text, but virtually all search engine frameworks (including bleve) use primarily the Levenshtein Distance for fuzzy string matching:
Levenshtein Distance: Also known as Edit Distance, it is the number of transformations (deletions, insertions, or substitutions) required to transform a source string into the target one. For example, if the target term is “book” and the source is “back”, you will need to change the first “o” to “a” and the second “o” to “c”, which will give us a Levenshtein Distance of 2.
Additionally, some frameworks also support the Damerau-Levenshtein distance:
Damerau-Levenshtein distance: It is an extension to Levenshtein Distance, allowing one extra operation: Transposition of two adjacent characters:
Ex: TSAR to STAR
Damerau-Levenshtein distance = 1 (Switching S and T positions cost only one operation)
Levenshtein distance = 2 (Replace S by T and T by S)
How to Use Fuzzy Matching in TALEND?
Step 1: Create an Excel “Sample Data” with 2 columns “Demo Event 1” and “Demo Event 2”.
Demo Event 1: This column contains the records on which we need to apply Fuzzy Logic.
Demo Event 2: This column contains the records that need to be compared with the Column 1 for Fuzzy match.
Step 2: In TALEND use the above Excel as input in the tfileInputExcel component and provide the same file again as input to the same component as shown in the diagram.
Step 3: In the tFuzzyMAtch component choose the following configurations as shown in the below diagram.
Step 4: In the tMap we need to choose the following column to take an output.
Step 5: Finally, you need to select an tFileOutputExcel component for the desired output.
In the final Extracted file, the Column “VALUE” shows the difference between the records and matches the records to their duplicate.
In a nutshell, we can say that the use of TALEND’s Fuzzy Matching helps in ensuring the data quality of any source data against a reference data source by identifying and removing any kind of duplicity created from inconsistent data. This technique is also useful for complex data matching and data duplicate analysis.
Girikon is a reputed provider of high-quality IT services including but not limited to Salesforce consulting, Salesforce implementation and Salesforce support.
Are you struggling to fit your migrated data into your target Salesforce environment?
Is there a need for creating multiple fields?
Don’t want to lose the data but don’t have the fields to migrate it to?
If the answers to most or all of your questions is ‘yes’, then this article can help you achieve all the above without creating any new fields and without suffering any data loss.
Where can you use this technique?
Lack of fields on the Target as compared to the Source environment
Data model does not match between the two environments
Do not want to create new fields in the Target environment
Do not have a separate custom object to hold the data
Figure 1 Salesforce data model not matching
How to achieve this?
We will extract all the data from the source environment and insert it into the Description field of the Target environment using TALEND.
We have migrated the data of Productsobject (OpportunityLineItem) from Source org.into a Description field (Special_Instruction__c) of Opportunity object in Target org.
We are using Excel to extract data from Source environment and a Salesforce connection to push the data on the Target environment. We can also create a Salesforce connection for both the environments (Recommended way).
Step 1: Create a Salesforce connection.
For this you need 3 parameters:
Figure 2 Create a Salesforce connection
You need to enter all the credentials for the Source environment.
Note:If you are connecting to a UAT environment be sure to change the URL in the tSalesforceConnection component.
Step 2:In this step we need to extract the data from both the source files.
Figure 3 Job Design in TALEND
To achieve the above process first you need two files:
Source File:In this file you need all the Product data (Opportunity Line Item data) along with the OpportunityId
Target File:In this file you need the Opportunity Id from the Target
Step 3:In this step we need to:
Insert the two files into the tMap
Apply an inner join between the two files.
Concatenate all the product data into a single variable and add a null check to all the fields before inserting them into variables.
Insert the variable into the Opportunity field.
Figure 4 tMap Design in TALEND
In this method we have added “|” to separate the data you can choose another symbol as per your convenience.
Note: Select the same selections in the tMap as highlighted above.
Step 4: After all the description is concatenated add a tDenormalize to concatenate the records with the same Opportunity Id based on the Description field.
Step 5:In this step use a tSalesforceOutput component to update the description field for the respective opportunity.
Finally attaching the complete job for reference.
More ways to implement the above approach:
Extracting data from Salesforce rather than Excel
Using other objects like Account, Contact etc.
Pushing data into Salesforce with Excel
Merging Product Data with Billing data
In a nutshell, we can say that whether your migration is for small or large data, you need to process your data in an efficient way. In your data migration journey, Talend can be your partner for ensuring hassle-free migration.
Girikon – a global provider of quality IT services houses a team of skilled Salesforce professionals including Salesforce consultants, administrators and developers.
Are you facing the following challenges while migrating data to Salesforce?
Tired of deleting the duplicates manually in Excel?
Excel keeps hanging up because of large data?
Applying VLOOKUP again and again for the same data set?
Duplicating the whole data again for UAT and Production environment?
Cannot apply complex transformation to your data set?
If you have answered yes to most or all of the questions above, then the solution to your problem is TALEND.
What is TALEND?
TALEND is an open source software integration platform which helps you to turn data into business insights effortlessly.
When we hear the term “Data Migration”, questions like – What is data migration? Why is it needed? How is it done? etc., pop up in our mind.
This article addresses all the basic queries on Data Migration along with the reasons for choosing TALEND as an ETL tool for Salesforce migration.
Which tool should be used in Salesforce Data Migration?
Selecting the right data migration tool depends largely on your needs. There are several data migration tools, but they won’t be of help if they fail to meet the specific goals and objectives of your company.. Listed below are some points that you might need to consider while choosing an ETL tool.
Let’s understand the Data Migration process before we go ahead and address the aforementioned issues.
What is Data Migration and How Does TALEND Fit in?
As the name itself suggests, data migration is the process in which data is transferred from one system to the other. These transfer systems can be data storage types or file formats. Data from the old system is transferred to a new system through a particular mapping pattern.
The above diagram is a job in TALEND showing how simple it is to extract data from a Source, apply any tansfomation logic and push into different Target destinations.
How is it done?
Source Data: TALEND can be used for importing the raw data from different sources, andprocess multiple types and formats of Source data for exampleFile, Database, CRM or API.
Data Cleansing: In this process we perform the following steps:
Detect and Correct (or remove) corrupt or inaccurate records from the dataset.
Identify incomplete, incorrect, inaccurate or irrelevant parts of the data.
Replace, modify, or delete the dirty or coarse data.
Data Deduplication: After Cleansing the data, Duplicate copies of data are removed from the dataset based on a pre-approved field set.
Data Transformation: In this process we transform the data from the Source to match the Target
Data Upload: After the transformation the data is uploaded to the Target system in which TALEND supports multiple platforms for example Excel, SQL server or Salesforce org.
Why choose TALEND?
Open Source Software: There are no cost implications for using “Talend Open Studio” or “TOS” since it is an open-source software.
Cost is applied if there is a demand for joint data collaboration.
Multiple Source/Target Format supported: We can extract the data directly from Salesforce, transform it, and push it directly into Salesforce without the need of an Excel or any Database. If in any case we do not have an Org to Org connection, we can still extract the data using Excel, manipulate, and load the data into Salesforce using TALEND since it supports all the Excel formats.
Faster: We can process and transform the data much faster than Excel as TALEND provides functionalities in which we can change the format, append new data onto existing one and create or remove columns on the go.
Reusable: In TALEND once the job is created the same job can be used for different target locations we do not need to transform the whole data again for different environments like (UAT or Production).
In the case of any future modification we just need to append the same job and we will be spared from going through the whole process of transforming the data again.
Supports Complex Data Transformation: TALEND provides the tools to perform highly complex transformations and manipulations on the data. It also provided the functionality to write custom scripts in JAVA wherever there is a need for custom transformations
It can also process huge amounts of data which is difficult to manage in Excel.
While there are several other tools for data migrations in the market, TALEND has become a preferred choice as it provides with the functionality of accessing data more easily, processing it more efficiently and moving it across various target systems very easily.
Girikon is a Salesforce consulting and development company offering quality IT services to clients across the globe. Based out in Phoenix, Arizona, the company has presence across Australia and India.