Interface | Description |
---|---|
SimilarityCalculator |
Class | Description |
---|---|
AsymmetricJaccardSimilarity | |
AvgInformationContentBestMatchesSimilarity | Deprecated |
AvgInformationContentLCSSimilarity | Deprecated |
CombinedJaccardConjunctiveSetSimilarity |
filters first based on SimJ, if passes threshold performs full ConjunctiveSetSimilarity
|
CombinedSimilarity | |
CombinedSimilarity2 | |
ConjunctiveSetInformationContentRatioSimilarity |
score is the IC of the intersection of all attributes divided my min IC of a or b
it's recommended this used as a sub-method of a MultiSimilarity check
|
ConjunctiveSetSimilarity |
ConjunctiveSetSimilarity(a,b) = IC( LCA_1(a,b) ^ LCA_2(a,b) ...
|
DescriptionTreeSimilarity |
A similarity between two entities constructed by threading two description trees together.
|
DisjunctiveSetSimilarity | |
EnrichmentEngine | |
JaccardBloomSimilarity | |
JaccardSimilarity | |
MaximumInformationContentSimilarity |
MaximumInformationContentSimilarity(a,b) = max { IC(LCA_1(a,b)) ...
|
MultiSimilarity |
This is the standard method to use when comparing entities with multiple attributes.
|
OldSimpleOwlSim |
Inputs
|
OldSimpleOwlSim.EnrichmentConfig | |
OverlapSimilarity | |
OWLClassExpressionPair | |
OWLObjectPair | |
OWLResultsManager | |
Reporter | |
SimEngine |
SimEngine provides access to multiple similarity calculation
procedures (see
Similarity ). |
Similarity |
Represents a pairwise similarity between two OWLObjects.
|
SimSearch |
quick search for best matches
|
Enum | Description |
---|---|
OldSimpleOwlSim.Stage |
Exception | Description |
---|
owltools file:test_resources/pizza.owl --sim-method DescriptionTreeSimilarity --sim http://owl.cs.manchester.ac.uk/2009/07/sssw/pizza#MeatFishAndVegetarianPizza-Closed http://owl.cs.manchester.ac.uk/2009/07/sssw/pizza#VegetarianToppingsPizza-Closed
One drawback with most existing bioinformatics ontology-based similarity analyses is the lack of OWL semantics. For example, most existing semantic similarity analyses treat the ontology as a simple graph. The subset of analyses that incorporate edge labels do so in ad-hoc ways, rather than taking advantage of the full semantics of the relationship.
These analyses are also dependent on all potential useful grouping classes being specified in advance, in a pre-coordinated ontology. This results in a lack of precision, as it is all but impossible to anticipate all potential grouping classes in advance.
Here we define a small number of methods for scoring similarity of two entities based on shared properties. The methods are completely neutral w.r.t the nature of these entities - they could be two individuals, two classes, or an individual and a class. They can be any type of individual or class, and can be compared with respect to any attribute. For example, organisms can be compared by phenotype or pizzas can be compared by toppings. The only requirement is that all the information is specified in OWL. The more heavily axiomatized the OWL is, the more informative the results will be.
The primary innovation is the use of a novel algorithm for computing the least common subsuming description. Unlike naive LCA computations, this also includes anonymous classes in the result. These are constructed on the fly. This has utility in a number of situations; for example, when comparing two complex descriptions, one using a human anatomy ontology and another using a zebrafish anatomy ontology, if we include a bridging ontology such as uberon then novel descriptions are generated that subsume both.
A subsumed by B if SubClassOf(A
B)
. This can be either asserted or inferred. Subsumption is
transitive, reflexive and anti-symmetric. The inverses relation is
subsumes.
A is a descendant of B if there is an OWL Graph
Edge in which A is the source and B is the target. See owltools.graph
for a description of OWL Graphs. The inverse relation
is is ancestor of. Descent is a super-property of subsumption
(i.e. if A subsumed by B, the A is a descendant of B).
A is a common subsumer of B and C if A subsumes B and A subsumes C
As we count owl:Thing in the universe of objects, if B and C are satisfiable then it follows that they must have at least one CS, owl:Thing
A is a Least Common Subsumer (LCS) of B and C if:
A is a common ancestor of B and C if A is a ancestor of B and A is a ancestor of C
A is a Least Common Ancestor (LCP) of B and C if:
note that the lack of anti-symmetric characteristic on the ancestor relation necessitates the final clause to ensure that objects related via cycles are included in the LCS set. For example:
SubClassOf(wheel SomeValuesFrom(partOf car))
SubClassOf(car SomeValuesFrom(hasPart wheel))
SubClassOf(audi car)
SubClassOf(bmw car)
P(audi) = { audi, car, wheel}
P(bmw) = { bmw, car, wheel}
LCP(audi, bmw) = { car, wheel }
A is a named subsumer of B if A is a subsumer of B, and A is in the set of named objects.
Example:
SubClassOf(car SomeValuesFrom(hasPart wheel))
SubClassOf(audi car)
SNamed(audi) = { audi, car, owl:Thing}
A is an anonymous subsumer of B if A is a subsumer of B, and A is in the set of class expressions.
With many ontologies, the number of anonymous subsumers will be larger than the set of named subsumers, because there are many combinations of expressions that can be made that subsume the source object.
Example:
SubClassOf(car SomeValuesFrom(hasPart wheel))
SubClassOf(wheel SomeValuesFrom(hasQuality round))
SubClassOf(audi car)
SAnon(audi) = {
SomeValuesFrom(hasPart,wheel)
SomeValuesFrom(hasPart,owl:Thing)
SomeValuesFrom(hasPart,SomeValuesFrom(hasQuality,round))
SomeValuesFrom(hasPart,SomeValuesFrom(hasQuality,Thing))
}
X is a referenced entity if it is in the signature of the ontology, or set of ontologies. All named objected are referenced
SubClassOf(wheel SomeValuesFrom(partOf car))
SubClassOf(car vehicle)
ObjectProperty(partOf)
R = { partOf car vehicle wheel SomeValuesFrom(partOf car) }
Note that in the above example, SomeValuesFrom(partOf
vehicle)
is not in the set of referenced entities, even
though it is a satisfiable class expression.
A is a Common Referenced Ancestor (CRP) of B and C if:
SubClassOf(car SomeValuesFrom(hasPart wheel))
SubClassOf(wheel SomeValuesFrom(hasQuality round))
SubClassOf(audi car)
SubClassOf(bmw car)
CRP(audi bmw) = { car wheel round SomeValuesFrom(hasPart wheel) SomeValuesFrom(hasQuality round) }
A is a Least Common Named Subsumer (LCNS) of B and C if:
if A ∈ LCNS(B,C)
then A ∈ LCS(B,C)
Note that calculation of the LCNS is relatively efficient, as the set is derived from the universe of named objects
A is a Least Common Anonymous Subsumer (LCAS) of B and C if:
if A ∈ LCAS(B,C)
then A ∈ LCS(B,C)
-- ie LCS includes anonymous classes
Calculation of LCAS is a non-trivial problem, because the universe of possible anonymous class expressions is very large and under certain models, infinite.
For any two classes B and C, under OWL-DL there is always at least one trivial LCAS: UnionOf(B C)
A is a Least Common Meaningful Subsumer (LCMS) of B and C if:
EquivalentClasses(car IntersectionOf(vehicle motorized ObjectExactCardinality(hasPart car_wheel 4)))
EquivalentClasses(bicycle IntersectionOf(vehicle ComplementOf(motorized) ObjectExactCardinality(hasPart bicycle_wheel 2)))
SubClassOf(car_wheel wheel)
SubClassOf(bike_wheel wheel)
# subsumers:
LCMAS(car,bicycle) = { IntersectionOf(vehicle ObjectMinCardinality(hasPart wheel 2) ObjectMaxCardinality(hasPart wheel 4) }
LCAS(car,bicycle) = LCMAS ∪ { UnionOf(car bicycle) }
LCMS(car,bicycle) = LCMAS ∪
LCS(car,bicycle) = LCAS
LCNS(car,bicycle) = { vehicle }
LCNP(car,bicycle) = { vehicle, wheel }
LCRP(car,bicycle) = { vehicle, wheel }
The LCMAS is the most informative of the measures, as it contains the most specific non-trivial description of what is shared
between cars and bicycles based on the knowledge available (they are vehicles with between 2 and 4 wheels)
(note: def of M needs modified to eliminate the description "vehicle with 2 wheels or with 4 wheels" which has a smaller number of members, but does not generalize as well)
If we add an axiom:
SubClassOf(vehicle SomeValuesFrom(hasPart wheel))
Then this adds a reference to a new class expression changes the results as follows:
LCNS(car,bicycle) = { vehicle }
LCRS(car,bicycle) = { vehicle SomeValuesFrom(hasPart wheel) }
LCNP(car,bicycle) = { vehicle, wheel }
LCRP(car,bicycle) = { vehicle, SomeValuesFrom(hasPart wheel) }
CRP(a,b) :
P = {}
for x in ancestor_of(a)
add x to P
for x in ancestor_of(b)
add x to P
RETURN P
LCRP(a,b) :
P = CRP(a,b)
for x in P:
IF (exists y in P AND
x ancestor_of y AND
NOT(y ancestor_of x))
THEN remove x from P
RETURN P
This algorithm finds an approximation of the meaningful least common subsumers (including both anonymous and named). Recall that using standard LCA algorithms will only give us the least common referenced subsumer, which misses informative answers from the much wider set of possible anonymous expressions.
The basic idea can be informally sketched as follows. The descriptions of a and b can be conceived of as description trees, with each branchpoint representing set of sub-expressions join by the IntersectionOf construct. For example, the description motorized vehicle with 4 red wheels is a tree with 4 leaves and 3 branch points:
We attempt to find a maximally informative tree that subsumes two given trees by aligning the branchpoints. We start at the roots of both trees, and traverse downwards, finding the best match for each node in the opposing tree. We calculate the LCRP at each point; if there are multiple non-redundant ancestors then we create an IntersectionOf expression. Where there is one, we use the LCRP.
LCS(a,b) :
X = {}
EA = all edges emanating from a
for ea in EA:
# find best matching edge for ea
eb = null
score = 0
for bx in LCRP(b):
for eb' in findEdge(b,bx)
score' = scoreMatch(ea,eb')
if score' > score:
score = score'
eb = eb'
x = mkUnionExpression(ea, eb, LCS(ea.tgt. eb.tgt))
add x to X
# combine
if |X| = 0 : return owl:Thing
= 1 : return only element of X
> 1 : return IntersectionOf(X)
Given the following ontology:
Ontology(test)
axon_terminals_degenerated_in_CA4 EquivalentTo phenotypeOf some hasPart some (CA4 and hasPart some (axon_terminal and hasQuality some degenerated))
dendrites_degenerated_in_CA3 EquivalentTo phenotypeOf some hasPart some (CA3 and hasPart some (dendrite and hasQuality some degenerated))
axon_terminal SubClassOf cell_process and partOf some neuron
dendrite SubClassOf cell_process and partOf some neuron
CA4 SubClassOf partOf some hippocampus
CA3 SubClassOf partOf some hippocampus
org1 instanceOf human and hasPhenotype some axon_terminals_degenerated_in_CA4
org2 instanceOf human and hasPhenotype some dendrites_degenerated_in_CA3
The answer to LCS(org1,org2)
is:
human and hasPhenotype some (partOf some hippocampus) and hasPart some ((cell_process and partOf some neuron) and hasQuality some degenerated)
Jaccard Similarity is the number of common attributes divided by the union of attributes. We define SimJ-RP using the set of referenced ancestors:
SimJ-RP(x,y) :
return | CRP(x,y) | / | RP(x) ∪ RP(y) |
One limitation of this technique is that it does not take into account
anonymous expressions.
We do not want to perform SimJ over all anonymous expressions, as there are potentially massive numbers of such expressions. Instead we can first compute the LCS (which includes anonymous expressions) using the above algorith, and then calculate the SimJ using only referenced ancestors.
SimJ-LCS-RP(x,y) :
z = LCS(x,y)
return | RP(z) | / | RP(x) ∪ RP(y) |
We illustrate these using the following ontology:
EquivalentClasses(car IntersectionOf(vehicle motorized ObjectExactCardinality(hasPart car_wheel 4)))
EquivalentClasses(bicycle IntersectionOf(vehicle ComplementOf(motorized) ObjectExactCardinality(hasPart bicycle_wheel 2)))
SubClassOf(car_wheel wheel)
SubClassOf(bike_wheel wheel)
# subsumers:
SimJ-RP(car,bicycle) =
| {vehicle, wheel |
-------------------
| { car, bicycle, vehicle, bicycle_wheel, car_wheel, wheel, motorized, complementOf(motorized),
IntersectionOf(vehicle motorized ObjectExactCardinality(hasPart car_wheel 4)),
ObjectExactCardinality(hasPart car_wheel 4),
IntersectionOf(vehicle ComplementOf(motorized) ObjectExactCardinality(hasPart bicycle_wheel 2))
ObjectExactCardinality(hasPart bicycle_wheel 2)
|
= 1/6
SimJ-LCS-RP(car,bicycle) =
| {vehicle, wheel, IntersectionOf(vehicle ObjectCardinalityBetween(hasPart wheel 2 4)) ObjectCardinalityBetween(hasPart wheel 2 4) |
-------------------
| ... |
= 1/3
The methods above work for any OWL individuals or classes - they are
completely generic. We can achieve higher precision if we compare with
respect to a particular attribute. The choice of attribute is
application-dependent. For example, with phenotype comparison, we may
wish to compare two entities with respect to the hasPhenotype object
property. We could specify this using a DL-like query atts.x = {
p : p ∈ SomeValuesFrom(phenotypeOf x) }
The Average Best Match is the average of all best matches between each attribute in x and the set of attributes in y, averaged for both x to y and y to x.
ABM(x,y) :
score = 0;
n = 0;
for a in x.atts:
score += bestMatch(a, y.atts)
n++
for a in y.atts:
score += bestMatch(a, x.atts)
n++
RETURN score / n
bestMatch(a1, atts) :
score = 0;
for a2 in atts:
score' = getScore(a1,a2)
score = max(score, score')
return score
ABM can be used with a number of different scoring schemes to score
attribute pairs. For example, ABM with Jaccard similarity of the LCS
would be called ABM-J-LCS.
SimJ-LCS(x,y) :
z = LCS(x,y)
return | LCRP(z) | / | RP(x) ∪ RP(y) |
owltools file:imports.owl --sim-method MultiSimilarity --sim -p http://purl.obolibrary.org/obo/RO_0002200 http://purl.obolibrary.org/obo/MGI_101759 http://purl.obolibrary.org/obo/MGI_101761
owltools file:imports.owl --sim-method MultiSimilarity --sim -p 'has phenotype' http://purl.obolibrary.org/obo/MGI_101759 http://purl.obolibrary.org/obo/MGI_101761
Copyright © 2010–2015. All rights reserved.