Text::Record::Deduper 0.05 software downloads, free Text::Record::Deduper 0.05 software downloads

Home >> Linux >> Programming >> Perl Modules >>Text::Record::Deduper 0.05


Popular Search

Text::Record::Deduper 0.05

  • Publisher:Kim Ryan
  • License:Perl Artistic License
  • File Size:-
  • Version:0.05
  • Operation System:-

Text::Record::Deduper 0.05's Description

Separate complete, partial and near duplicate text records
Text::Record::Deduper is a Perl module with separate complete, partial and near duplicate text records.SYNOPSIS use Text::Record::Deduper; my $deduper = new Text::Record::Deduper; # Find and remove entire lines that are duplicated $deduper->dedupe_file("orig.txt"); # Dedupe comma separated records, duplicates defined by several fields $deduper->field_separator(','); $deduper->add_key(field_number => 1, ignore_case => 1 ); $deduper->add_key(field_number => 2, ignore_whitespace => 1); # unique records go to file names_uniqs.csv, dupes to names_dupes.csv $deduper->dedupe_file('names.csv'); # Find 'near' dupes by allowing for given name aliases my %nick_names = (Bob => 'Robert',Rob => 'Robert'); my $near_deduper = new Text::Record::Deduper(); $near_deduper->add_key(field_number => 2, alias => \%nick_names) or die; $near_deduper->dedupe_file('names.txt'); # Create a text report, names_report.txt to identify all duplicates $near_deduper->report_file('names.txt',all_records => 1); # Find 'near' dupes in an array of records, returning references # to a unique and a duplicate array my ($uniqs,$dupes) = $near_deduper->dedupe_array(@some_records);This module allows you to take a text file of records and split it into a file of unique and a file of duplicate records.Records are defined as a set of fields. Fields may be separated by spaces, commas, tabs or any other delimiter. Records are separated by a new line.If no options are specifed, a duplicate will be created only when all the fields in a record (the entire line) are duplicated.By specifying options a duplicate record is defined by which fields or partial fields must not occur more than once per record. There are also options to ignore case sensitivity, leading and trailing white space.Additionally 'near' or 'fuzzy' duplicates can be defined. This is done by creating aliases, such as Bob => Robert.This module is useful for finding duplicates that have been created by multiple data entry, or merging of similar records. Requirements:
· Perl

Text::Record::Deduper 0.05's Tags

Text::Record::Deduper 0.05 Related Softwares

Top Freeware

Top Shareware

Top Software

New Software

Copyright 2004-2015 www.bleusoftware.com All rights reserved