LINQS

STATISTICAL RELATIONAL LEARNING GROUP @ UMD



 

D-Dupe: An Interactive Tool for Entity Resolution in Social Networks

Visual Analytics Science and Technology (VAST) - October 2006
Download the publication : ddupe-vast.pdf [673Ko]   ddupe-vast.ps [2.6Mo]  
Visualizing and analyzing social networks is a challenging problem that has been receiving growing attention. An importantrst step, before analysis can begin, is ensuring that the data is accurate. A common data quality problem is that the data may inadvertently contain several distinct references to the same underlying entity; the process of reconciling these references is called entity- resolution. D-Dupe is an interactive tool that combines data mining algorithms for entity resolution with a task-specc network visualization. Users cope with complexity of cleaning large networks by focusing on a small subnetwork containing a potential duplicate pair. The subnetwork highlights relationships in the social network, making the common relationships easy to visually identify. D-Dupe users resolve ambiguities either by merging nodes or by marking them distinct. The entity resolution process is iterative: as pairs of nodes are resolved, additional duplicates may be revealed; therefore, resolution decisions are often chained together. We give examples of how users can exibly apply sequences of actions to produce a high quality entity resolution result. We illustrate and evaluate the bents of D-Dupe on three bibliographic collections. Two of the datasets had already been cleaned, and therefore should not have contained duplicates; despite this fact, many duplicates were rapidly idented using D-Dupe's unique combination of entity resolution algorithms within a task-specc visual interface.

BibTex references

@InProceedings{bilgic:vast06,
  author       = "Bilgic, Mustafa and Licamele, Louis and Getoor, Lise and Shneiderman, Ben",
  title        = "D-Dupe: An Interactive Tool for Entity Resolution in Social Networks",
  booktitle    = "Visual Analytics Science and Technology (VAST)",
  month        = "October",
  year         = "2006",
  address      = "Baltimore",
}

Other publications in the database