Personalized Relevance

From Maay

How to determine that a document d is more relevant than a document d' for a user u emitting a request on a word w at time t ?

The following is purely intuitive and largely open to discussions on the mailing list (https://lists.berlios.de/mailman/listinfo/maay-dev).

Approach

The approach is based on a tripartite graph user-document-word: users own documents and documents contain words (or documents are "tagged" by words).

image:Bipartite1.jpg

In this figure, the user U1 owns three documents D1, D2 and D4. The documents D1 and D3 contain the word W2. The document D2 is owned by two users U1 and U2.

From this tripartite graphs, we can build two bipartite graphs :

  • the semantic document graph GU linking documents and words.
  • the social document graph GW linking documents and users.

The distances between two documents can be measured in both GU and GW. We say that the distance between two documents gets higher if there are numerous paths between these documents in the graph and if these paths are short. The number of paths of length l between two documents d and d' is a graph G is noted p(d \rightarrow d',l,G).

Semantic Distance

We naturally obtain a computation of a semantic distance SOR on GW. The idea is to capture those documents that share some words or, indirectly, use the same lexical without any usual "semantic" treatment. This formula is a immediate brut-force formula that could be dramatically enhanced.

SOR(d,d') =  \sum_{l=1}^{k}{\frac{p(d \rightarrow d',l,GW)}{\sum_{e \in GU}{p(d \rightarrow e,l,GW)}}}

Social Distance

The social distance SER is computed as SOR except the computation applies on GU. Briefly, two users sharing a lot of documents are, somehow, close. If two documents are owned by the same user, or by a user and one of her closest "friend", then we may assume that these documents are, somehow, close too.

SER(d,d') = \sum_{l=1}^{k}{\frac{p(d \rightarrow d', l,GU)}{\sum_{e \in GW}{p(d \rightarrow e,l,GU)}}}

Relevance of a Document for A Request

We assume that a user is interested in a document d if the document d is close to documents owned by the user and containing the requested word. This is a very strong assumption that may be challenged. Anyhow, computing the distance from one document to the related documents owned by the requester is interesting and is :

PR(u,d,w) = \sum_{d' \in D(u): w \in d} SOC(d,d')+SER(d,d')

342433446102455712700687

related web sites