View on Open Library ↗

An introduction to duplicate detection

Name: An introduction to duplicate detection
Author: Felix Naumann
ISBN: 9783031018350

by Felix Naumann

2 hrs read

Rate this book:

512 pages 2010

About This Book

With the ever increasing volume of data, data quality problems abound. Multiple, yet different representations of the same real-world objects in data, duplicates, are one of the most intriguing data quality problems. The effects of such duplicates are detrimental; for instance, bank customers can obtain duplicate identities, inventory levels are monitored incorrectly, catalogs are mailed multiple times to the same household, etc. Automatically detecting duplicates is difficult: First, duplicate representations are usually not identical but slightly differ in their values. Second, in principle all pairs of records should be compared, which is infeasible for large volumes of data. This lecture examines closely the two main components to overcome these difficulties: (i) Similarity measures are used to automatically identify duplicates when comparing two records.Well-chosen similarity measures improve the effectiveness of duplicate detection. (ii) Algorithms are developed to perform on very large volumes of data in search for duplicates.Well-designed algorithms improve the efficiency of duplicate detection. Finally, we discuss methods to evaluate the success of duplicate detection.

Buy This Book

Amazon Ebook → Bookshop.org Supports indie bookshops → Apple Books Ebook → Open Library Borrow Free to borrow →

As an Amazon Associate and Bookshop.org affiliate, BookOrb earns from qualifying purchases.

Write a Review

Author: Felix Naumann
First Published: 2010
Pages: 512
ISBN-13: 9783031018350
ISBN-10: 3031018354
Language: EN

Ebook Morgan & Claypool 2010 9781608452217

Springer Nature 2010 9783031018350

More by Felix Naumann

Data Profiling

2018

Informationsintegration

2006

Quality-Driven Query Answering for Integrated Information Systems

2002

View all books by Felix Naumann →