On user reputationWhy do you use a reputation system to compute how much text has been revised? Why don't you just use a mix of text age / number of revisions? We use a reputation system for two reasons:
We took great care when devising the specific algorithms that assign reputation to ensure that authors gain reputation also when their contributions are preserved only in part. Also, users typically contribute to a range of pages; very controversial pages form a minority of pages, and even in these pages, outright revert battles between established users are rare. For these reasons, we believe that even users who contribute to controversial pages gain the reputation they deserve. If I get reverted by a vandal, will my reputation suffer? Not much at all. When a user A reverts a user B, the reputation of B suffers in proportion to the reputation of A. Vandals usually have no reputation (if they are anonymous), or very low reputation, so the reputation of B would suffer only minimally. If I contribute a paragraph, and someone later improves the wording, what happens to my reputation? You reputation would still increase. The text analysis in WikiTrust is able to distinguish between contributions that are undone, and contributions that are reworded, adapted, reformatted, or improved. I think an automated system that computes user reputations is evil! We don't think so, but we understand that this is a controversial topic. For this reason, in the spirit of friendliness and collaboration of wikis, WikiTrust does not explicitly display author reputation. Author reputation is only used internally, in the computation of text trust. Will there be some human control on reputations? Author reputation values are stored in a database table, and so they can be inspected and modified, if desired. However, our point of view is that the author reputation we compute is best viewed as an internal, purely mathematical quantity that helps us achieve a better text coloring. Can users collude and raise their reputation without doing any good work? Our algorithms make this very difficult; for the details, see our talks and papers. On text trustI don't believe your tool computes the real trust of text! So what is it that it really computes? WikiTrust computes a quantity (which it calls "trust") which measures to what extent the text has been revised, and left unchanged, by high-reputation authors. So why do you still call it trust? We like short, concise terms for our mathematical notions. This is common in science. Author "reputation" is a similar term: it has nothing to do with the reputation of a person in real life; it is a mathematical concept. We are aware that many people are sensitive to the word "trust", and at some future point we may start calling it "text reputation" rather than "text trust", but again, we consider these names as labels for mathematical notions. How would this be useful to me? When you look at a page, the text coloring tells you which pieces have changed recently, and have not been subsequently revised. By clicking on those and other portions of text, you can figure out who inserted the text, and in which context. Can a vandal insert false information, then revise it until the text backround becomes white?No. A user can only increase the trust of text up to their current reputation value. So, a novice, or an anonymous user, can increase very little the trust of text. A reputable user can increase more the trust of text, but only revision by multiple, distinct, high-reputation users can lead to fully trusted text. Can an author increase the trust of the same portion of text multiple times? No. Once a user has caused the trust of a word to increase, multiple distinct users need to increase the trust of the word for said user to be able to increase it again. And as only users of higher reputation than the word can increase the trust of the word, this is not easy to achieve. Internally, this safeguard is implemented by associating with each word the list of users who have recently increased the trust of the word. This list is maintained using compression and hashing techniques, so that in practice, very little storage is required. Why do contiguous words occasionally have very slightly different color for no apparent reason?The hashing algorithm we use to compress the list of authors who increased the trust of a word (see above) has a probability of collisions of about one in a thousand. The hash is computed both on the basis of the author, and the word. So, in about one in a thousand words, we falsely believe that the author has already increased the word trust. This causes occasional small differences in text color. As text color is only a rough visual hint, we do not believe that these variations are problematic. What? Hash collisions? Randomization? How can trust depend on randomization, of all things? Life is random. A little bit of randomness never hurt anybody. Look up Heisenberg's principle. Relax :-) On text author and originWhy don't you just use one of the fast diff systems, such as unix diff, or the one implemented in git?These standard text diff systems lack several features we thought were important:
When pages are merged or split, due to the way in which we track deleted text, our author and origin attribution work in a reasonable way, and our trust system copes as well. In fact, when we analyze a revision, we consider not only the immediately preceding revision, but also the revision that is most similar to the current one among the last 10 or so past revisions. Hence, when a merge or split occurs, it is likely that many pages are correctly analyzed. We could devise better algorithms if the Mediawiki API offered a way to notify of the merge / split events. On the systemHow much computing power is required to run WikiTrust?A single CPU core of a modern PC can process several revisions per second. How many, it depends on the size of the revisions, but typical speeds vary between 2 and 40 revisions per second. The code is fully parallel, so the more CPU cores, the faster the analysis. Thus, keeping up with new revisions is normally not an issue. To analyze an existing wiki, you have two alternatives:
As an example, the Italian Wikipedia, as of October 2009, requires 110 GB of disk space for the storage of the revisions, and a few GB of space in the database for the storage of the metadata. Where is the additional information used by WikiTrust stored? WikiTrust stores various types of additional information: metadata on pages and revisions, the reputation of users, and the analyzed revisions, to name a few. All the metadata is stored in database tables, in the same database as the Mediawiki tables. The tables used by WikiTrust have the prefix "wikitrust_". The revision text is stored in compressed blobs: every blobs consists of a number of consecutive revisions of a page, so that it compresses well. For small wikis, the blobs can be stored in the database. For large wikis, we recommend you store the blobs in the filesystem (this can be done using a simple configuration option). I am interested in using WikiTrust on a Wikipedia. Can I do it? Can you do it? What does it take? Using WikiTrust on a Wikipedia (for instance, the Wikipedia for a given language) is a little bit more involved than using it on your own wiki, since it is a special setup (people edit the Wikipedia, which resides on MediaWiki Foundation servers, and WikiTrust just serves the analyzed text). However, if you would like to experiment with WikiTrust on some Wikipedia, let us know, and we may set it up for you. In the long run, we hope the Wikimedia Foundation will make the process easier (see below). Will WikiTrust be installed at the WikiMedia Foundation? We hope, but it is not decided. The goal would be to make it available as an optional extension, which users can activate in their profile. Users who activate the option would see the extra "wikitrust" tab (or menu item, or whatever the Foundation will deem appropriate) and be able to access the information computed by WikiTrust. Note that if that happens, the Foundation will obviously dictate which wording, access method, etc, is used, not us. This is obvious, but worth remembering. We just produce open-source code, and we are willing to assist the Foundation in using it, but any decision is theirs to take. Why is most of WikiTrust written in Ocaml?Because Luca thinks that Ocaml is a great language. Yes, really. More in detail, WikiTrust started its existence in Python. This made it very easy to get started, but soon two problems emerged:
So, one Winter, we took the Python code and we completely rewrote it in Ocaml, and we have been happy ever since (well, especially Luca). Now, when we make a change, we just needs to test the thing we changed: we know that it's extremely unlikely that we have broken something else, due to the strength of the type system. Memory management is also superb. But I don't know Ocaml! Could you not have written it in Perl? Perl! Aaack!! (says Luca; Bo and Ian managed to sneak in some Perl while Luca was distracted). But jokes aside, we needed a language that is:
How do I contribute to WikiTrust? Finally someone asks this question! Welcome!! You can contribute in several ways:
On the demoWhy this demo?We needed to test our code, and we wished to show that it can do something useful. Why is the demo so slow? The demo requires a lot of back-and-forth between your Firefox browser, the servers at UCSC, and the servers at the Wikimedia Foundation. When you ask for a page, this is what happens:
Please contribute to this FAQ by sending your questions to help@wikitrust.net! |