
LINQS
STATISTICAL RELATIONAL LEARNING GROUP @ UMD
Collective Entity Resolution In Relational Data
ACM Transactions on Knowledge Discovery from Data, Volume 1, Number 1, page 1-36 - March 2007
Many databases contain uncertain and imprecise references to
real-world entities. The absence of identifiers for the underlying
entities often results in a database which contains multiple
references to the same entity. This can lead not only to data
redundancy, but also inaccuracies in query processing and knowledge
extraction. These problems can be alleviated through the use of {\em
entity resolution}. Entity resolution involves discovering the
underlying entities and mapping each database reference to these
entities. Traditionally, entities are resolved using pair-wise
similarity over the attributes of references. However, there is often
additional relational information in the data. Specifically,
references to different entities may co-occur. In these cases,
collective entity resolution, in which entities for co-occurring
references are determined jointly, rather than independently, can
improve entity resolution accuracy. We propose a novel relational
clustering algorithm that uses both attribute and relational
information for determining the underlying domain entities, and we
give an efficient implementation. We investigate the impact that
different relational similarity measures have on entity resolution
quality. We evaluate our collective entity resolution algorithm on
multiple real-world databases. We show that it improves entity
resolution performance over both attribute-based baselines and over
algorithms that consider relational information but do not resolve
entities collectively. In addition, we perform detailed experiments on
synthetically generated data to identify data characteristics that
favor collective relational resolution over purely attribute-based
algorithms.
BibTex references
@Article\{bhattacharya:tkdd07,
author = "Bhattacharya, Indrajit and Getoor, Lise",
title = "Collective Entity Resolution In Relational Data",
journal = "ACM Transactions on Knowledge Discovery from Data",
number = "1",
volume = "1",
pages = "1-36",
month = "March",
year = "2007",
}
![bhattacharya-tkdd.pdf [354Ko]](/basilic/web/Publications/images/pdf.png)

