Profile-based Routing
From Maay
The idea behind the profile-based routing is to select, among the numerous acquaintances, a small set of nodes having both similarities with the requester and interest for the queried words.
The following is also a collection of intuitive ideas. Feel free to give your opinion on the mailing list (https://lists.berlios.de/mailman/listinfo/maay-dev) !
| Table of contents |
Semantic Profile Approximation
A node a has four ways to notice that a neighbor b exhibited kind of interest for a word w:
- b sends a search request on w,
- b sends back a search response on w,
- b is a document provider for a search on w
- a provides to b a document on w.
We propose to count the number of times a node expressed an interest for a word. We use it to determine the semantic profile of this node. Yet, note that nodes do not treat the same messages, so two nodes might have a totally different opinion about the interest for a word expressed by a common neighbor.
Let W(a) be the set of words known by the node a.
This set contains the words occurring in the documents known by a and the words for which a has a knowledge through previous incoming messages.
The node a approximates the semantic profile of a node b by counting, for each word w in W(a), the number of times c expressed an interest for w.
We present the semantic profile of c from a's point of view as a vector
.
This vector has n dimensions where n is the number of words in W(a). A node a measures the profile similarities between two nodes e and c by computing the affinity based on the cosine similarity between their vectors:

Node-Word Interest Approximation
The profile-based routing should also take into account node interests for the queried words. We distinguish two notions:
- from a’s point of view, the specialization of b for a word w quantifies its part of interest for w compared to its interest for the other words in W(a)
![SP_a(b,w) = \frac{\overrightarrow{b_a[w]}}{\sum_{x \in \mathcal W(a)}{\overrightarrow{b_a[x]}}}](/wiki/images/math/e88b0219133ef0b9bd8230fba1203bbf.png)
- the expertise of b for w measures the part of claims expressed by b for w compared to the other neighbors of a
![XP_a(b,w) = \frac{\overrightarrow{b_a[w]}}{\sum_{n \in \mathcal N(a)}{\overrightarrow{n_a[w]}}}](/wiki/images/math/7fa5160714db7f7507360f4a81c55d37.png)
Lexical Approximation
When a node joins the system or has a brutal interest for a new topic, it might not know document matching its queries nor neighbors interested by the queried words. A solution consists of routing queries toward neighbors interested with words that are lexically close to the queried words. The lexical proximity of w0 and w1 is usually measured by the Levenshtein distance (http://en.wikipedia.org/wiki/Levenshtein) d(w0,w1) i.e. by counting the number of insertions, deletions or replacements of characters to change w0 into w1. Thus, the distance between a singular word and its plural form or words with the same etymology root is small. Therefore, nodes tend to specialize on words that are lexically close to their most interesting words. It results that these nodes acts as relationship enablers between nodes that did not previously know each other.
Routing Score
It is now possible to define the routing score of a node b for a request from z containing the word w. This routing score depends on the node a computing it :

The task of computing the NRS and the affinity may be very costly. However, this can be reduced using three optimizations:
- we can only consider words having high value of specialization for the neighbor or the node
- by periodically evicting neighbors which have less affinity with the node and/or using a Least Recently used policy
- by only considering words close to w for the computation of NRS.
