
LINQS
STATISTICAL RELATIONAL LEARNING GROUP @ UMD
D-Dupe: An Interactive Tool for Entity Resolution in Social Networks
Visualizing and analyzing social networks is a challenging problem
that has been receiving growing attention. An importantrst
step, before analysis can begin, is ensuring that the data is accurate.
A common data quality problem is that the data may inadvertently
contain several distinct references to the same underlying
entity; the process of reconciling these references is called entity-
resolution. D-Dupe is an interactive tool that combines data mining
algorithms for entity resolution with a task-specc network visualization.
Users cope with complexity of cleaning large networks
by focusing on a small subnetwork containing a potential duplicate
pair. The subnetwork highlights relationships in the social network,
making the common relationships easy to visually identify.
D-Dupe users resolve ambiguities either by merging nodes or by
marking them distinct. The entity resolution process is iterative: as
pairs of nodes are resolved, additional duplicates may be revealed;
therefore, resolution decisions are often chained together. We give
examples of how users can exibly apply sequences of actions to
produce a high quality entity resolution result. We illustrate and
evaluate the bents of D-Dupe on three bibliographic collections.
Two of the datasets had already been cleaned, and therefore should
not have contained duplicates; despite this fact, many duplicates
were rapidly idented using D-Dupe's unique combination of entity
resolution algorithms within a task-specc visual interface.
BibTex references
@InProceedings{bilgic:vast06,
author = "Bilgic, Mustafa and Licamele, Louis and Getoor, Lise and Shneiderman, Ben",
title = "D-Dupe: An Interactive Tool for Entity Resolution in Social Networks",
booktitle = "Visual Analytics Science and Technology (VAST)",
month = "October",
year = "2006",
address = "Baltimore",
}
![ddupe-vast.pdf [673Ko]](/basilic/web/Publications/images/pdf.png)
![ddupe-vast.ps [2.6Mo]](/basilic/web/Publications/images/ps.png)

