WikiTrust computes an index of text trust,
which is a measure of how much the text has been revised by reliable users. The goal of this text trust is to alert visitors to recent changes, and possibly vandalism, that has been introduced in a page. This trust is encoded via background text coloring. Text with white background is trusted, and visitors can assume that at least a few high-reputation authors have seen the page since the text has been inserted, and have left the text unchanged. Text with orange background is untrusted (the darker the orange, the lower the trust), and visitors can assume that it has been inserted, or modified, recently.
It is to be noted that WikiTrust uses the term "trust"
in a mathematical way: trust
is a number meant to encode a measure. This is similar to what is done in the literature about reputation systems: the "reputation"
of a user is not really meant to capture the real-life reputation of a person, but is simply a mathematical quantity computed on the basis of the person's activity and other inputs.
Content-driven user reputation
WikiTrust aims at computing how much the text has been revised by reliable
users. Thus, at the basis of WikiTrust is a reputation system
for users. The reputation system is content-driven,
as it measures the reputation of users on the basis of how their contributions fare, rather than on the basis of user-to-user feedback. The main idea is as follows:
- Authors whose contributions are preserved in full or in part gain reputation. In particular, users gain reputation when their reputations are preserved unchanged, but also (in slightly lower measure) when they are adapted, reworded, or re-formatted.
- Authors whose contributions are reverted, or almost entirely undone, lose reputation.
We took great care to ensure that authors gain reputation even from revisions that are revised or reworded. Here are some details on the computation of reputation.
To validate our notion of user reputation, we have measured how well the reputation of a user predicts how long the future
contributions of that user will persist. We have found a very strong correlation: users who are of high reputation, are much more likely to make long-lasting contributions than users of low reputation. For the details, see our talks and papers
From user reputation to text trust
The text "trust" is computed as a measure of how much the text has been revised by high-reputation authors. Thus, the text trust is computed using author reputations, which are rescaled to be in the interval [0, 10]. The goals of the algorithm are as follows:
- Text should gain full trust only when revised by multiple high-reputation authors. This prevents anyone from single-handedly inserting fully-trusted content: full trust requires consensus.
- Every modification to the text (insertions, deletions, text moves) should leave some trace in the form of lower text trust where the modification occurred. This ensures that every attempt to tamper with a page leaves some mark that can be noticed by subsequent visitors.
- Contributions by higher-reptutation authors should be trusted more than contributions by low-reputation authors.
To achieve these goals, our trust algorithm works as follows (see the talks and papers
for the details):
- Inserted text. When text is first inserted by a user U, it has a trust that is equal to about 30-40% of the reputation of U. Thus, the text created by novice users, or by anonymous users, will initially have reputation 0; the text created by top-reputation users will have reputation around 3-4 (we sometimes modify these values slightly in the code, so the above values are only indicative).
- Deleted text. When text is deleted, the "margins" of the cut where the text has been deleted are marked with a trust equal to 30-40% of the reputation of the user performing the deletion. That is, the margins of the cut have the same trust as text that is inserted anew. The motivation for this is that, by cutting some portions of text, one can drastically alter its meaning, so that we should assign the same trust to the cut point, as to the rest of the text.
- Displaced text. When text is moved, the margins of the cuts delimiting the block moves are marked with the same low trust as new text. Again, the idea is that since the meaning can be drastically altered by moving text around, the cut points need to be marked with low trust.
- Unchanged text. When a user edits a page, the text that is left unchanged by the user gains a little bit of trust, up to the reputaton of the user. For instance, a user of reputation 7 can cause text on the page that has trust less than 7 to become a little bit more trusted. The idea is that the user, by leaving the text unchanged, has implicitly provided at least a weak approval of the text. The increment in trust is a function of three factors:
- The higher the reputation of the user, the greater the increment.
- The closer the text to the point where the edit occurred, the greater the increment (as we assume that users pay more attention to text closer to the point where the edit occurs).
- A user cannot raise the trust of text twice in a row: once a user has raised the trust of a piece of text, several different users must edit the same page before the user can again raise its trust.
To validate our notion of text trust, we have conducted experiments
that show that high trust is correlated with high future longevity for the text: text that is high trust in a revision lasts longer. While this correlation provides a quantitative validation of our algorithm, the goal of our notion of text trust is not limited to predicting the future longevity of text. The main goal of our notion of text trust is to alert visitors to recent, unrevised, text changes.