Wikipedia and Google

We all know the famous saying attributed to Jimmy Wales: “If it isn’t on Google, it doesn’t exist”. It originates in a New Yorker article on Wikipedia. What is less known is that the Wikipedia founder was apparently was not in fact advocating that online sources are the only ones that matter, but, rather, something else, such as that online references “probably” serve as a good starting-point for determining whether something is “notable” or not.

The relationship between Google and Wikipedia is central to the online encyclopedia’s success. Why did Google start returning Wikipedia articles in the top 5 results for searches? Something to do with Wikipedia’s internal link structure? That should not count: only links from other sites are supposed to be measured by PageRank. In any case, it’s safe to say that if Google searches had not raised the profile of Wikipedia, the free encyclopedia would not have developed in the same way.

The famous quote mentioned above does raise the issue of Wikipedia editors possibly relying overly on online sources, particularly when deciding whether some topic or other is notable, or during deletion debates. Following the publication of my article on expertise and Wikipedia, I was contacted in early May 2009 by Sage Ross who offered some comments and we had a discussion about various issues relating to the article, one of which was precisely the question of online sourcing. Rather than rehashing the discussion I asked Sage if it would be OK to post the relevant excerpts and he agreed – first on a Wikipedia discussion page and now here.

[Sage wrote:] On an unrelated note, you write later in the article that “marginal
cultures which have not been digitised and uploaded run the risk of becoming invisible.” I have two comments. First, Jimmy Wales’ facetious comment that “If it isn’t on Google, it doesn’t exist” is not, and has never been, normative on Wikipedia; rather, it seems to me like a statement of the zeitgeist of the Internet age. Second, if
marginal cultures are not covered in any digitized content, they don’t run the risk of becoming invisible…they already are invisible. If they aren’t on Google, that means that there are essentially no books or scholarly articles about them and that publishing institutions (including, in recent years, the Internet-connected public) have been
ignoring them since the rise of electronic publishing. That’s not to say that Wikipedia doesn’t play a role in re-enforcing patterns of marginalization; through the “Reliable sources” guideline, in particular, it does do that. But I think it’s unfair to lay that marginalization at the feet of Wikipedia, since that only causes problems when marginal cultures have already been made invisible (or rather, have never been made visible) by the forms of media that Wikipedia builds on and is built from. I would argue that Wikipedia actually levels the field for the unjustly marginalized, who are normally crowded out by the popular. There were thousands upon thousands of newspaper articles and television stories about Anna Nicole Smith; there are just a few dozen* corresponding Wikipedia articles. Conversely, there are Wikipedia articles for small villages with no particular claim to fame for which the only sources are census and geographical data…the invisible and marginal made visible and human-readable. In a paper encyclopedia, editors would have to find content to remove for every bit that got added, so that encyclopedia sets would not grow without bound and could still be sold to suburban families by door-to-door encyclopedia salesmen.

*This is a wild estimate.

[Mathieu responded:] Re. your second and related point, I did not say that J. Wales’ comment was normative, but it does encapsulate a certain reliance on a handy http page to link to buttress one’s point. I think there might be a “loop effect” deriving from Google’s ranking of WP as well. This is a complicated issue, which also depends on whether one does believe that everything which could have been digitised has been (I don’t), or on where one stands on the notability question, or perhaps even on what constitutes original research in an encyclopedia which has no space limitations. So, while I appreciate your point that WP may level the playing field in some respects, I would have to reserve judgement until some more definite form of empirical evidence has been produced.

[Sage reponded:] Following up a bit on marginalization and sources…

You say:

> So, while I appreciate your point that WP may level the playing field in some
> respects, I would have to reserve judgement until some more definite form of
> empirical evidence has been produced.

Fair enough. But you seem willing to judge Wikipedia for marginalizing topics that don’t have digital sources, without presenting definite empirical evidence and despite the fact that, in both policy and practice, Wikipedia encourages the use of print sources (including ones that are not available online). Online sources are treated as a convenience for readers, but there is no hesitation to use offline sources when they are superior.

Do you have particular cultures in mind that you see as being marginalized by Wikipedia’s Verifiability requirements?

[Mathieu responded:] In general, I would venture that marginal / underground / subcultural /counter-cultural events, people and artefacts that existed before 1995 would not necessarily have been comprehensively digitised. I had some anecdotal
evidence of this when I created an article for a by no means insignificant art / cultural group active 1988-1993 which was judged not notable because no online sources were available. The point being, the group in question definitely made an impact on the cultural scene at the time but the sources which document this (art or music magazines, exhibition catalogues, concert flyers, fanzines, radio shows, etc) are not online. Now, I didn’t even know about the whole AFD process then; and the art / cultural group was active in Paris, while I created an entry on WP-en. So things might have turned out differently on the French WP… people might have known about it or been more receptive or whatever…

[Sage responded:] Thanks. It’s true that Wikipedia creates threshold for inclusion that
re-enforces existing patterns of marginalization. It sounds like what you’ve run up against has more to do with Wikipedia’s definition of Original Research than Verifiability; offline magazines, and possibly exhibition catalogs as well, would be considered Reliable Sources on Wikipedia that could be used to establish at topic’s Notability. Of
course, Wikipedia is only the most prominent example of a whole class of wiki venues that follow the model Wikipedia created but often have different social structures; some of these are open to a wider array of content, and in a both a cultural and technical sense they exist because of Wikipedia. So I think I understand where you’re coming from now, but I still think it’s misleading to blame Wikipedia for the
shortcomings of cultural institutions whose role it is do the kinds of things that on Wikipedia are called “Original Research”. (The problems with that definition are interesting; there is a gap that exists for some areas of knowledge between what is allowed on Wikipedia and what is considered original enough to merit publication elsewhere, e.g., in terms of the analysis of literature.)

The other problem you ran into, perhaps, is the opaque complexity of the way Wikipedia works, so that your work was pushed out because it didn’t conform to Wikipedians’ expectations even though it might, in principle, have been made into something stable. That kind of thing is a big problem, and one that the community is constantly struggling with.

[Discussion ended]

4 Comments Wikipedia and Google

  1. AvatarSage Ross

    PageRank, at least in its basic concept, pays no mind to whether a link is coming from within the same website or from an unaffiliated website. It deals with the individual webpage as the unit of analysis, so each article on Wikipedia has its own PageRank. Because most of the links from any given article are to other Wikipedia pages (nearly all since Wikipedia went “nofollow”, which happened well after it rose to search engine dominance), Wikipedia is very efficient at distributing ‘Google juice’ amongst its articles. But that comes ultimately from the very high number of links coming to Wikipedia from the outside; in large part, Wikipedia’s search engine dominance can be originally traced to bloggers. (It’s not just Google, either. Wikipedia has similar prominence in the results of other search engines.)

    Much of search engine optimization consists of more deliberate efforts at apportioning the ‘Google juice’ for a set of webpages (whether in a single site, or across multiple sites controlled by the same entity). For example, SEOs will deliberately make sure many pages link to those specific pages they want to boost the PageRank of while at the same time limiting the number of other links so the influence of the remaining links is more concentrated.

    Wikipedia, in contrast, spreads its links out and has a “natural” (if that word can apply here) link structure that takes all the incoming ‘Google juice’ that accrues to externally popular articles (the ones bloggers and others link to) and reapportions it according to internal popularity (the articles that other articles link to).

    You can see cases where PageRank and Wikipedia’s internal linking structure complement each other to result in atypically low search results for bad articles or ones that are tucked away in isolated corners of Wikipedia. For example, do a Google search for “arts and letters”, go through the results until you find the Wikipedia, and then take a look the article and find out why.

  2. AvatarMathieu O'Neil

    OK… you seem to be using the example of a famous racehorse as an illustration of the fact that Wikipedia pages with lost of internal links (including links to pages such as “California”) will garner index authority… well, maybe. But you ignore one all-important fact here, I think. We are dealing with the ponies, racing, bets. Ring a bell? That’s right – it’s obvious the Camorra, the Russian Mafiya and the Hong Kong Triads are lubricating the web with Google Juice in order to, ah, manipulate something or other. Seriously though I did think that PageRank was supposed to measure links between sites not pages… will have to check I guess.

  3. Pingback: How The Associated Press will try to rival Wikipedia in search results » Nieman Journalism Lab

Leave A Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.