
LINQS
STATISTICAL RELATIONAL LEARNING GROUP @ UMD
A Latent Dirichlet Model for Unsupervised Entity Resolution
SIAM Conference on Data Mining (SDM) - April 2006
Note: Winner of the Best Paper Award.
Entity resolution has received considerable attention in recent
years. Given many references to underlying entities, the goal is to
predict which references correspond to the same entity. We show how to
extend the Latent Dirichlet Allocation model for this task and propose
a probabilistic model for collective entity resolution for relational
domains where references are connected to each other. Our approach
differs from other recently proposed entity resolution approaches in
that it is a) generative, b) does not make pair-wise decisions and c)
captures relations between entities through a hidden group
variable. We propose a novel sampling algorithm for collective entity
resolution which is unsupervised and also takes entity relations into
account. Additionally, we do not assume the domain of entities to be
known and show how to infer the number of entities from the data. We
demonstrate the utility and practicality of our relational entity
resolution approach for author resolution in two real-world
bibliographic datasets. In addition, we present preliminary results on
characterizing conditions under which relational information is
useful.
BibTex references
@InProceedings{bhattacharya:sdm06,
author = "Bhattacharya, Indrajit and Getoor, Lise",
title = "A Latent Dirichlet Model for Unsupervised Entity Resolution",
booktitle = "SIAM Conference on Data Mining (SDM)",
month = "April",
year = "2006",
note = "Winner of the Best Paper Award.",
}
![bhattacharyasdm06.pdf [214Ko]](/basilic/web/Publications/images/pdf.png)

