The idea behind the profile-based routing is to select, among the numerous acquaintances, a small set of nodes having both similarities with the requester and interest for the queried words.
The following is also a collection of intuitive ideas. Feel free to give your opinion on the mailing list (https://lists.berlios.de/mailman/listinfo/maay-dev) !
| Table of contents |
A node a has four ways to notice that a neighbor b exhibited kind of interest for a word w:
We propose to count the number of times a node expressed an interest for a word. We use it to determine the semantic profile of this node. Yet, note that nodes do not treat the same messages, so two nodes might have a totally different opinion about the interest for a word expressed by a common neighbor.
Let W(a) be the set of words known by the node a.
This set contains the words occurring in the documents known by a and the words for which a has a knowledge through previous incoming messages.
The node a approximates the semantic profile of a node b by counting, for each word w in W(a), the number of times c expressed an interest for w.
We present the semantic profile of c from a's point of view as a vector
.
This vector has n dimensions where n is the number of words in W(a). A node a measures the profile similarities between two nodes e and c by computing the affinity based on the cosine similarity between their vectors:

The profile-based routing should also take into account node interests for the queried words. We distinguish two notions:
![SP_a(b,w) = \frac{\overrightarrow{b_a[w]}}{\sum_{x \in \mathcal W(a)}{\overrightarrow{b_a[x]}}}](Routing_Requests_fichiers/e88b0219133ef0b9bd8230fba1203bbf.png)
![XP_a(b,w) = \frac{\overrightarrow{b_a[w]}}{\sum_{n \in \mathcal N(a)}{\overrightarrow{n_a[w]}}}](Routing_Requests_fichiers/7fa5160714db7f7507360f4a81c55d37.png)
When a node joins the system or has a brutal interest for a new topic, it might not know document matching its queries nor neighbors interested by the queried words. A solution consists of routing queries toward neighbors interested with words that are lexically close to the queried words. The lexical proximity of w0 and w1 is usually measured by the Levenshtein distance (http://en.wikipedia.org/wiki/Levenshtein) d(w0,w1) i.e. by counting the number of insertions, deletions or replacements of characters to change w0 into w1. Thus, the distance between a singular word and its plural form or words with the same etymology root is small. Therefore, nodes tend to specialize on words that are lexically close to their most interesting words. It results that these nodes acts as relationship enablers between nodes that did not previously know each other.
It is now possible to define the routing score of a node b for a request from z containing the word w. This routing score depends on the node a computing it :

The task of computing the NRS and the affinity may be very costly. However, this can be reduced using three optimizations: