MediaWiki and OmahaWiki.org

Mar 2, 2007

A ways back in the past I had a MediaWiki install at WikiOmaha.org with the hopes that a wiki could be formed for the omaha community, by the omaha community (sound familiar?). Anyway, I never really did much with it, and a few days ago a professor from Creighton contacted me about my domain and pooling resources.

He has created OmahaWiki.org which WikiOmaha.org now re-directs to. He is having students flesh it out. I came into the picture to help set up some bots to manage the content.

It turns out there is a cool framework for MediaWiki's called the "Python Wikipedia Robot Framework" that is written in python. I got the scripts working on my machine and then I turned my attention to writing a bot that would do a word-count on every page, and add a stub to that page if it was under a given threshold.

I had forgotten how awesome Python is. It really is a good language, I just wish I had call to use it every once in a while. Anyway, here is my Python bot for that framework. You can grab a file version here

#!/usr/bin/python
# -*- coding: utf-8  -*-
"""
-----// Stub Adder //------------------------------------------------------
File: jmh_addstubs.py
Version: 1.0
Author: John Hobbs
Contact: [email protected]

This bot will iterate through all pages of the wiki and append a generic
stub ('') to them if they do not have one already and have under
a given number of "words" in them.  Words, here, are counted as _any_ series
of characters seperated by a space.  The default maximum number of words
that the bot will work on is 5, so it is recommended that you pass it a more
realistic value.

Call

python wordcount.py

to have your change be done on all pages of the wiki. If that takes too
long to work in one stroke, run:

python wordcount.py Pagename

to do all pages starting at pagename.

There are two command line options:

-dryrun
    This will check and notify you but will not actually change anything.
    
-words=XX
  This is the word threshold. Replace XX with the biggest wordcount that you
  want the bot to append stubs to.
  
"""
import wikipedia
import pagegenerators
import sys

def workon(page):
    try:
        text = page.get()
    except wikipedia.IsRedirectPage:
        return

    jmh_tokens = text.split(' ')
    if len(jmh_tokens) <= jmh_count and -1 == text.find('Stub}}'):
      text += ''
      if jmh_dryrun:
        print '--// MATCH: [['+page.title()+']] -> Dry Run, No Change //--'
      else:
        print '--// MATCH: [['+page.title()+']] -> Stub Added //--'
        page.put(text)

try:
    start = []
    test = False
    jmh_dryrun = False
    jmh_count = 5
    for arg in wikipedia.handleArgs():
        if arg.startswith("-words="):
            temp = arg.split('=')
            jmh_count = int(temp[1])
        elif arg.startswith("-dryrun"):
            jmh_dryrun = True
        else:
            start.append(arg)
    if start:
        start = " ".join(start)
    else:
        start = "!"
    mysite = wikipedia.getSite()
    basicgenerator = pagegenerators.AllpagesPageGenerator(start=start)
    generator = pagegenerators.PreloadingGenerator(basicgenerator)
    for page in generator:
        workon(page)

finally:
    wikipedia.stopme()