Blog
August 14, 2024
How to Handle Data Duplication in Airtable Part 1
Blog
August 14, 2024
How to Handle Data Duplication in Airtable Part 1
This is the first part of a two part series that helps you remove duplicates from your Airtable base.
One of the most common issues you can face in Airtable is data duplication. But how do we handle it?
The most common source of data duplication are forms. Respondents sometimes submit the form twice, just because they were not sure whether the first time was successful. This, in turn, creates duplicate records in Airtable.
Let’s explore three ways of handling the issue, which are different in difficulty, flexibility, and degree of automation:
A. Easy but with limitations (Automation)
B. Hard but fully flexible (Scripting)
C. Manual combination (Dedupe Extension) [Go to part 2 of this article]
A. Easy but not flexible (Automation)
The easiest way to handle data duplication in Airtable is with the use of Airtable Automations. With this system, you will be able to mark any new entry as duplicated based on the existing data. The main requirement for this deduplication method is a unique identifier field (column).
The unique identifier field is nothing but a value that has to be unique for each of the records. This can be an employee number, email address or a Student ID number. As long as it is unique, it can be used to remove duplicate values. In the following example, I will use an email field as such a unique identifier.
To begin, we need to create an Automation. For that, we need to click on the “automations” layer which is available on the top left corner of your Airtable base. Once you click on it, you will have the option to click on the “Create automation” on the bottom left corner of your screen. Now let’s go through the Automation setup step by step
Step 1: Selecting the trigger
The first step is to select a trigger which will prompt your automation to run. Let’s select the “When record matches condition” trigger. This option will trigger as soon as a record matches the condition we will define in step #2.
Step 2: Setting up the trigger
For the trigger to work, it needs a couple of input parameters. Firstly, select the table which contains the records you want to de-duplicate. Then you set the condition to trigger, when the unique identifier is not empty.
Our example is displayed in the screenshot below. The deduplication automation will run when the email field is “not empty”.
Please note: this trigger will run as soon as the cell value is changed. Therefore, it works best with form submissions, but it is not flexible enough to handle duplication through manual entry in grid view.
Step 3: Select all the records with the same email
Next thing, we select the action we would like to perform. In our case, we would like to identify all the records that have the same identifier. To do that, we use the “Find Records” action.
The screenshot below shows how we set up this action to filter for duplicates in our Airtable:
Table: [Insert table where you want to remove duplicates]
Find records based on: “Condition”
Condition: Where email is Field Values email (select Dynamic variable and select the cell value from the trigger record)
This action will then pass on a list of all the records that have the same email address to the next action
Step 4: Check how many records have the same unique identifier
Before we can mark the item as duplicate, we need to have an intermediate step where we check how many records there are for this specific email. We do this by inserting a conditional action.
Our goal is to trigger the action, if there is more than one record with the same email address. Luckily, the previous action already selected all the records with the same email address. We just need to extract the information and put it into a conditional statement.
When we create the condition, we are automatically prompted to insert a statement:
For the left-hand side of the equation, we set up a dynamic token clicking on the blue + sign and by selecting the output of the “Find Records” action and we choose the “Length” property.
As operator we select the “>”
On the right-hand side of the equation we input the number 1
Once this is done, the conditional actions will run, every time the length of the records with the same email is larger than one.
Step 5: Mark duplicate records as duplicates
If the new record is added, we need to mark them as such. For that, we use the “Update Records” action. In the table we create a field (column) called “Duplicate”. When the record is a duplicate, we want the field value to be “true”
The screenshot below shows how we set up this action to mark duplicates in our Airtable:
Table:[Insert table where you want to remove duplicates]
Fields:“Duplicate” → True
Step 6: Activate your automation and handle duplicates
Now you will need to activate the automation by toggling the button on the top left corner to “on”. This will then automatically tag all the incoming records as duplicates. To further handle the duplicates, you can insert additional automations or manually filter the grid view in the “Data layer” by the “Duplicate” field. Then, you will have to sort out the duplicates manually.
Limitations of this De-Duplication
Some things that you need to keep in mind are that for this option to work, there must be a unique identifier field. Furthermore, this method only works with new records, as the trigger is only activated once a record is changed.
Furthermore, handling the duplicates after the tagging with this method is still manual. Nevertheless, for some easier applications with limited records it might be the perfect solution. The next solution is more complex, but it overcomes many of the drawbacks of this simple method
B. Advanced but flexible (Scripting)
This method of de-duplication is more difficult but also more flexible. It is more appropriate for environments where there isn’t necessarily one single field as a unique identifier, or the actions to be taken with the duplicate values are more complicated (e.g., combine records automatically). This method will be using the scripting extension to handle duplication, allowing you to delete, update or perform any other operation on the duplicate records.
In our example, we will be using the same starting table as for the simple method A (automation). Thus, the actions our script performs are still quite simple, but there are now limits to what can be done. If you are not experienced with Java Script, or you would just like to find out what is possible through the Airtable API an Airtable consultant can best answer your questions. Monino Solutions offers free sessions to answer quick questions. If you are interested feel free to sign up here.
Our sample script linked below works with the same email field as already used in method A to remove the duplicates. Below you can find a step by step instruction on how to run the sample script.
Step 1: Add scripting extension to your base/table
Firstly, make sure that you are on the data layer of your base (top left corner right next to the base name). Go to the “Extensions” option on the top right corner of your Airtable base. This will open the Extension sidebar with an option to click on “Add an extension” on the top right corner. After you click that, please select “Scripting” from the marketplace pop-up.
Step 2: Paste the sample script
Once the extension is added, it will offer you a “Edit code button”. If you open it it will give you an entry field for the code. Here you can write your own code, or insert sample code and tailor it to your needs. Click here to get the code for the de-duplication sample script from our github page.
Step 3: Adjust the code
Now that you have the sample code in your extension, please make sure to adapt it to your data structure. For example:
Change the string in row 4 to the name of your table
Select the unique identifier field in row 30
Adjust the action the script takes (after row 50) - In this case duplicates are deleted
In adjusting this kind of script is where Airtable developers from Monino Solutions can bring the most value. If you are not experienced with Java Script, an Airtable developer could make up for the cost just through the time savings it generates for you.
Below you can find a more detailed explanation of the script.
Script Explanation
The script can look intimidating at first, nevertheless, it eliminates the duplicates in a very similar way as method A.
We are getting the records from the respective tables (rows 1-10)
Then we are going through a loop and finding duplicate records (rows 15-50)
If they are duplicates, then we update the oldest records with the newest records based on the createdTime. We then also delete the duplicates (rows 53-68)
This is just an example, we can work together on customising your script in many ways, depending on your specific business needs. If you found this article helpful, don’t forget to bookmark our website for future, useful articles. Feel free to also schedule a first, introductory call to discuss your specific needs and how we can help you achieve them.
If you are interested in learning about the third method, Airtable’s Dedupe Extension (Method C), read the second part of this article click here.
Click here to schedule a free initial consultation session to discuss your specific needs.
One of the most common issues you can face in Airtable is data duplication. But how do we handle it?
The most common source of data duplication are forms. Respondents sometimes submit the form twice, just because they were not sure whether the first time was successful. This, in turn, creates duplicate records in Airtable.
Let’s explore three ways of handling the issue, which are different in difficulty, flexibility, and degree of automation:
A. Easy but with limitations (Automation)
B. Hard but fully flexible (Scripting)
C. Manual combination (Dedupe Extension) [Go to part 2 of this article]
A. Easy but not flexible (Automation)
The easiest way to handle data duplication in Airtable is with the use of Airtable Automations. With this system, you will be able to mark any new entry as duplicated based on the existing data. The main requirement for this deduplication method is a unique identifier field (column).
The unique identifier field is nothing but a value that has to be unique for each of the records. This can be an employee number, email address or a Student ID number. As long as it is unique, it can be used to remove duplicate values. In the following example, I will use an email field as such a unique identifier.
To begin, we need to create an Automation. For that, we need to click on the “automations” layer which is available on the top left corner of your Airtable base. Once you click on it, you will have the option to click on the “Create automation” on the bottom left corner of your screen. Now let’s go through the Automation setup step by step
Step 1: Selecting the trigger
The first step is to select a trigger which will prompt your automation to run. Let’s select the “When record matches condition” trigger. This option will trigger as soon as a record matches the condition we will define in step #2.
Step 2: Setting up the trigger
For the trigger to work, it needs a couple of input parameters. Firstly, select the table which contains the records you want to de-duplicate. Then you set the condition to trigger, when the unique identifier is not empty.
Our example is displayed in the screenshot below. The deduplication automation will run when the email field is “not empty”.
Please note: this trigger will run as soon as the cell value is changed. Therefore, it works best with form submissions, but it is not flexible enough to handle duplication through manual entry in grid view.
Step 3: Select all the records with the same email
Next thing, we select the action we would like to perform. In our case, we would like to identify all the records that have the same identifier. To do that, we use the “Find Records” action.
The screenshot below shows how we set up this action to filter for duplicates in our Airtable:
Table: [Insert table where you want to remove duplicates]
Find records based on: “Condition”
Condition: Where email is Field Values email (select Dynamic variable and select the cell value from the trigger record)
This action will then pass on a list of all the records that have the same email address to the next action
Step 4: Check how many records have the same unique identifier
Before we can mark the item as duplicate, we need to have an intermediate step where we check how many records there are for this specific email. We do this by inserting a conditional action.
Our goal is to trigger the action, if there is more than one record with the same email address. Luckily, the previous action already selected all the records with the same email address. We just need to extract the information and put it into a conditional statement.
When we create the condition, we are automatically prompted to insert a statement:
For the left-hand side of the equation, we set up a dynamic token clicking on the blue + sign and by selecting the output of the “Find Records” action and we choose the “Length” property.
As operator we select the “>”
On the right-hand side of the equation we input the number 1
Once this is done, the conditional actions will run, every time the length of the records with the same email is larger than one.
Step 5: Mark duplicate records as duplicates
If the new record is added, we need to mark them as such. For that, we use the “Update Records” action. In the table we create a field (column) called “Duplicate”. When the record is a duplicate, we want the field value to be “true”
The screenshot below shows how we set up this action to mark duplicates in our Airtable:
Table:[Insert table where you want to remove duplicates]
Fields:“Duplicate” → True
Step 6: Activate your automation and handle duplicates
Now you will need to activate the automation by toggling the button on the top left corner to “on”. This will then automatically tag all the incoming records as duplicates. To further handle the duplicates, you can insert additional automations or manually filter the grid view in the “Data layer” by the “Duplicate” field. Then, you will have to sort out the duplicates manually.
Limitations of this De-Duplication
Some things that you need to keep in mind are that for this option to work, there must be a unique identifier field. Furthermore, this method only works with new records, as the trigger is only activated once a record is changed.
Furthermore, handling the duplicates after the tagging with this method is still manual. Nevertheless, for some easier applications with limited records it might be the perfect solution. The next solution is more complex, but it overcomes many of the drawbacks of this simple method
B. Advanced but flexible (Scripting)
This method of de-duplication is more difficult but also more flexible. It is more appropriate for environments where there isn’t necessarily one single field as a unique identifier, or the actions to be taken with the duplicate values are more complicated (e.g., combine records automatically). This method will be using the scripting extension to handle duplication, allowing you to delete, update or perform any other operation on the duplicate records.
In our example, we will be using the same starting table as for the simple method A (automation). Thus, the actions our script performs are still quite simple, but there are now limits to what can be done. If you are not experienced with Java Script, or you would just like to find out what is possible through the Airtable API an Airtable consultant can best answer your questions. Monino Solutions offers free sessions to answer quick questions. If you are interested feel free to sign up here.
Our sample script linked below works with the same email field as already used in method A to remove the duplicates. Below you can find a step by step instruction on how to run the sample script.
Step 1: Add scripting extension to your base/table
Firstly, make sure that you are on the data layer of your base (top left corner right next to the base name). Go to the “Extensions” option on the top right corner of your Airtable base. This will open the Extension sidebar with an option to click on “Add an extension” on the top right corner. After you click that, please select “Scripting” from the marketplace pop-up.
Step 2: Paste the sample script
Once the extension is added, it will offer you a “Edit code button”. If you open it it will give you an entry field for the code. Here you can write your own code, or insert sample code and tailor it to your needs. Click here to get the code for the de-duplication sample script from our github page.
Step 3: Adjust the code
Now that you have the sample code in your extension, please make sure to adapt it to your data structure. For example:
Change the string in row 4 to the name of your table
Select the unique identifier field in row 30
Adjust the action the script takes (after row 50) - In this case duplicates are deleted
In adjusting this kind of script is where Airtable developers from Monino Solutions can bring the most value. If you are not experienced with Java Script, an Airtable developer could make up for the cost just through the time savings it generates for you.
Below you can find a more detailed explanation of the script.
Script Explanation
The script can look intimidating at first, nevertheless, it eliminates the duplicates in a very similar way as method A.
We are getting the records from the respective tables (rows 1-10)
Then we are going through a loop and finding duplicate records (rows 15-50)
If they are duplicates, then we update the oldest records with the newest records based on the createdTime. We then also delete the duplicates (rows 53-68)
This is just an example, we can work together on customising your script in many ways, depending on your specific business needs. If you found this article helpful, don’t forget to bookmark our website for future, useful articles. Feel free to also schedule a first, introductory call to discuss your specific needs and how we can help you achieve them.
If you are interested in learning about the third method, Airtable’s Dedupe Extension (Method C), read the second part of this article click here.
Click here to schedule a free initial consultation session to discuss your specific needs.
This is the first part of a two part series that helps you remove duplicates from your Airtable base.
One of the most common issues you can face in Airtable is data duplication. But how do we handle it?
The most common source of data duplication are forms. Respondents sometimes submit the form twice, just because they were not sure whether the first time was successful. This, in turn, creates duplicate records in Airtable.
Let’s explore three ways of handling the issue, which are different in difficulty, flexibility, and degree of automation:
A. Easy but with limitations (Automation)
B. Hard but fully flexible (Scripting)
C. Manual combination (Dedupe Extension) [Go to part 2 of this article]
A. Easy but not flexible (Automation)
The easiest way to handle data duplication in Airtable is with the use of Airtable Automations. With this system, you will be able to mark any new entry as duplicated based on the existing data. The main requirement for this deduplication method is a unique identifier field (column).
The unique identifier field is nothing but a value that has to be unique for each of the records. This can be an employee number, email address or a Student ID number. As long as it is unique, it can be used to remove duplicate values. In the following example, I will use an email field as such a unique identifier.
To begin, we need to create an Automation. For that, we need to click on the “automations” layer which is available on the top left corner of your Airtable base. Once you click on it, you will have the option to click on the “Create automation” on the bottom left corner of your screen. Now let’s go through the Automation setup step by step
Step 1: Selecting the trigger
The first step is to select a trigger which will prompt your automation to run. Let’s select the “When record matches condition” trigger. This option will trigger as soon as a record matches the condition we will define in step #2.
Step 2: Setting up the trigger
For the trigger to work, it needs a couple of input parameters. Firstly, select the table which contains the records you want to de-duplicate. Then you set the condition to trigger, when the unique identifier is not empty.
Our example is displayed in the screenshot below. The deduplication automation will run when the email field is “not empty”.
Please note: this trigger will run as soon as the cell value is changed. Therefore, it works best with form submissions, but it is not flexible enough to handle duplication through manual entry in grid view.
Step 3: Select all the records with the same email
Next thing, we select the action we would like to perform. In our case, we would like to identify all the records that have the same identifier. To do that, we use the “Find Records” action.
The screenshot below shows how we set up this action to filter for duplicates in our Airtable:
Table: [Insert table where you want to remove duplicates]
Find records based on: “Condition”
Condition: Where email is Field Values email (select Dynamic variable and select the cell value from the trigger record)
This action will then pass on a list of all the records that have the same email address to the next action
Step 4: Check how many records have the same unique identifier
Before we can mark the item as duplicate, we need to have an intermediate step where we check how many records there are for this specific email. We do this by inserting a conditional action.
Our goal is to trigger the action, if there is more than one record with the same email address. Luckily, the previous action already selected all the records with the same email address. We just need to extract the information and put it into a conditional statement.
When we create the condition, we are automatically prompted to insert a statement:
For the left-hand side of the equation, we set up a dynamic token clicking on the blue + sign and by selecting the output of the “Find Records” action and we choose the “Length” property.
As operator we select the “>”
On the right-hand side of the equation we input the number 1
Once this is done, the conditional actions will run, every time the length of the records with the same email is larger than one.
Step 5: Mark duplicate records as duplicates
If the new record is added, we need to mark them as such. For that, we use the “Update Records” action. In the table we create a field (column) called “Duplicate”. When the record is a duplicate, we want the field value to be “true”
The screenshot below shows how we set up this action to mark duplicates in our Airtable:
Table:[Insert table where you want to remove duplicates]
Fields:“Duplicate” → True
Step 6: Activate your automation and handle duplicates
Now you will need to activate the automation by toggling the button on the top left corner to “on”. This will then automatically tag all the incoming records as duplicates. To further handle the duplicates, you can insert additional automations or manually filter the grid view in the “Data layer” by the “Duplicate” field. Then, you will have to sort out the duplicates manually.
Limitations of this De-Duplication
Some things that you need to keep in mind are that for this option to work, there must be a unique identifier field. Furthermore, this method only works with new records, as the trigger is only activated once a record is changed.
Furthermore, handling the duplicates after the tagging with this method is still manual. Nevertheless, for some easier applications with limited records it might be the perfect solution. The next solution is more complex, but it overcomes many of the drawbacks of this simple method
B. Advanced but flexible (Scripting)
This method of de-duplication is more difficult but also more flexible. It is more appropriate for environments where there isn’t necessarily one single field as a unique identifier, or the actions to be taken with the duplicate values are more complicated (e.g., combine records automatically). This method will be using the scripting extension to handle duplication, allowing you to delete, update or perform any other operation on the duplicate records.
In our example, we will be using the same starting table as for the simple method A (automation). Thus, the actions our script performs are still quite simple, but there are now limits to what can be done. If you are not experienced with Java Script, or you would just like to find out what is possible through the Airtable API an Airtable consultant can best answer your questions. Monino Solutions offers free sessions to answer quick questions. If you are interested feel free to sign up here.
Our sample script linked below works with the same email field as already used in method A to remove the duplicates. Below you can find a step by step instruction on how to run the sample script.
Step 1: Add scripting extension to your base/table
Firstly, make sure that you are on the data layer of your base (top left corner right next to the base name). Go to the “Extensions” option on the top right corner of your Airtable base. This will open the Extension sidebar with an option to click on “Add an extension” on the top right corner. After you click that, please select “Scripting” from the marketplace pop-up.
Step 2: Paste the sample script
Once the extension is added, it will offer you a “Edit code button”. If you open it it will give you an entry field for the code. Here you can write your own code, or insert sample code and tailor it to your needs. Click here to get the code for the de-duplication sample script from our github page.
Step 3: Adjust the code
Now that you have the sample code in your extension, please make sure to adapt it to your data structure. For example:
Change the string in row 4 to the name of your table
Select the unique identifier field in row 30
Adjust the action the script takes (after row 50) - In this case duplicates are deleted
In adjusting this kind of script is where Airtable developers from Monino Solutions can bring the most value. If you are not experienced with Java Script, an Airtable developer could make up for the cost just through the time savings it generates for you.
Below you can find a more detailed explanation of the script.
Script Explanation
The script can look intimidating at first, nevertheless, it eliminates the duplicates in a very similar way as method A.
We are getting the records from the respective tables (rows 1-10)
Then we are going through a loop and finding duplicate records (rows 15-50)
If they are duplicates, then we update the oldest records with the newest records based on the createdTime. We then also delete the duplicates (rows 53-68)
This is just an example, we can work together on customising your script in many ways, depending on your specific business needs. If you found this article helpful, don’t forget to bookmark our website for future, useful articles. Feel free to also schedule a first, introductory call to discuss your specific needs and how we can help you achieve them.
If you are interested in learning about the third method, Airtable’s Dedupe Extension (Method C), read the second part of this article click here.
Click here to schedule a free initial consultation session to discuss your specific needs.
Other Blogs
Other Blogs
Check our other project Blogs with useful insight and information for your businesses
Other Blogs
Other Blogs
Check our other project Blogs with useful insight and information for your businesses
Other Blogs
Other Blogs
Check our other project Blogs with useful insight and information for your businesses