How Find Duplicates, Merge and Purge Work

FlexMail's Find Duplicates, Merge and Purge functions are part of FlexMail Advanced and can be used to find duplicates (de-duplication), combine multiple files into one single file (merging), and extract the contents of one or more files from a file (purging). FlexMail allows you to perform these operations on any database.

The Find Duplicates function of FlexMail works on a single file. It identifies duplicates in the file that is currently linked to FlexMail. The Merge and Purge functions of FlexMail process multiple files. In a merge process all unique records will be combined in the destination file. In a Purge process all duplicate records in the source files will be removed from the destination file.

FlexMail does not require that the incoming files are of the same database format or have the same field structures. This means that you can take files as they are and start processing immediately. You define the link between each field in the source files and the linked file. FlexMail allows you to directly change the linked file if it is in one of the updateable formats or save the survivor records to a new file.

Note: There is no limit to the number of records FlexMail can process per task, but FlexMail can process no more than 32 input files per task (linked file plus 31 source files).

FlexMail performs a de-duplication, merge or purge task in three basic steps:

  1. FlexMail reads all the input records, identifying duplicates and arranging them into sets.

    The duplicate detection process is governed by the match code you have defined for a task and the particular way you have configured that match code. For more information about FlexMail match codes and the options available for configuring them see Match Codes.

  2. After the duplicates have been identified, FlexMail marks each input record as either a "record to keep" or a "record to purge".

    All input records are identified in one of these two ways. How records will be marked, however, varies with the settings selected for the specific task. Depending on how the task has been set up, for example, FlexMail might automatically mark the first record from each duplicate set as a "record to keep" and mark all the other duplicates in a group as "records to purge".

    Alternatively, FlexMail could be set up in a way that all last records of a duplicate group will be marked as "records to keep". There are other ways records can be marked as well, and you can even configure FlexMail to allow you to select the survivors manually.

  3. To complete the process, FlexMail does any, all, or none of the following:

    Which output files are created, depend upon the output options you have selected. For more information on the output files that FlexMail can generate, see Output Files.