Google
 
   
Login
Username:

Password:


Lost Password?

Register now!
Search
Main Menu
top books
Polls
What do you think about php-deluxe.net?
Excellent!
Cool
Hmm..not bad
What the hell is this?
encyclopedia
recommendation
compare webbrowser
Freenet DSL
Who's Online
5 user(s) are online (5 user(s) are browsing encyclopedia)

Members: 0
Guests: 5

more...
browser tip
Unix Befehle
manual of unix befehle
recommendation!
Sponsored
partner

Record linkage

Record linkage refers to the task of finding identical entries in two or more files. The initial idea goes back to Halbert L. Dunn ( Record Linkage in: American Journal of Public Health, Vol. 36 (1946), 1412-1416). In the 1950s, Howard Borden Newcombe laid the probabilistic foundations of modern record linkage theory.

In 1969, Fellegi and Sunter formalized these ideas. Their pioneering work A Theory For Record Linkage is, still today, the mathematical tool for any record linkage application.

Mathematical Model

In an application with two files, A and B, denote the rows ( records ) by alpha (a) in file A and eta (b) in file B. Assign K characteristics to each record. The set of records that represent identical entities is defined by

M = left{ (a,b); a=b; a in A; b in B ight}

and the complement of set M, namely set U representing different entities is defined as

U = { (a,b); a eq b; a in A, b in B } .

A vector, gamma is defined, that contains the coded agreements and disagreements on each characteristic:

gamma left[ alpha ( a ), eta ( b ) ight] = { gamma^{1} left[ alpha ( a ) , eta ( b ) ight] ,..., gamma^{K} left[ alpha ( a ), eta ( b ) ight] }

where K is a subscript for the characteristics (sex, age, martial status, etc.) in the files. The conditional probabilities of observing a specific vector gamma given (a, b) in M, (a, b) in U are defined as

m(gamma) = P left{ gamma left[ alpha (a), eta (b) ight] | (a,b) in M ight} = sum_{(a, b) in M} P left{gammaleft[ alpha(a), eta(b) ight] ight} cdot P left[ (a, b) | M ight]

and

u(gamma) = P left{ gamma left[ alpha (a), eta (b) ight] | (a,b) in U ight} = sum_{(a, b) in U} P left{gammaleft[ alpha(a), eta(b) ight] ight} cdot P left[ (a, b) | U ight],

respectively.