New Wiki and EC2 Small Instance Shortcomings

by Matt Cholick

For quite a few years I've used a personal wiki. I can't imagine tracking things without one. They're amazingly useful for documentation. Other online solutions like Google documents, sites, or similar offerings just don't cut it. Rich text editors always either behave sluggishly or require fighting the editor. Overlapping styles, masked properties, or some stupid auto formatting on a carriage return always get in the way. Raw text editing with simple markup is a much better way to create a document. There's no ambiguity; the author can view the document's structure exactly.

I've had trouble finding personal wiki software I really like though. This is mostly the fault of Confluence (which we use at work). Confluence is such great software that few things really measure up. I've tried Dokuwiki, Jspwiki, and Jamwiki recently but none are tolerable next to Confluence. The trouble with a lot of wiki software is that it doesn't expose a document tree structure well. Most wiki software authors take their cues from wikipedia where a tree structure doesn't make much sense. I mentally organize things hierarchically, though, and that's how I want my documentation organized. Confluence also has several useful macros for neat things like generating a table of contents and creating other dynamic content based on meta data.

After a lot of test installs I've finally found XWiki. XWiki matches (and in some ways even exceeds) the Confluence feature set. The software has hierarchical page structuring, macros, spaces, a rich permission matrix, theming - everything I could want. It even allows Groovy scripting on pages, which is just icing. I didn't even think to search for something with Groovy scripting as I couldn't imagine that it existed. It is a perfect feature though.

XWiki's one drawback is that it is definitely heavy software. That level of scripting and dynamic page generation takes a lot of resources. Like a lot of Java projects, it also pulls in every open source library on the planet; the distribution includes a whopping 99.8 MB of third party jar files. The dynamic features and dependencies all add up to create a pretty demanding piece of software.

That's where I ran into trouble with my ec2 micro instance. Here's a cpu graph from last night. The site stopped responding after clicking around and then visiting a specific page. This particular document is expensive because it has several different blocks of code on which the software is trying to do source syntax highlighting in multiple languages. I was tired at this point last night, so instead of killing the server and tracking down the trouble I just went to bed. When I woke up, to my surprise, things were actually still unresponsive. Here's a cpu graph of last night:

That's a 12 hour window of 100% cpu utilization. Clearly something was going on. After digging a bit, I found that the micro instances use cpu throttling. A vm has finite computing power per unit of time. When the box has consumed it, the instance cpu is throttled back to something like a few percentage points of what it was able to use previously. This is in the documentation but a bit buried. Actually using the term throttling on the instance type summary page would have made it clearer. Understanding this pattern has made earlier behavior with the instance a lot clearer too. For example, Tomcat server restarts would go from seconds to minutes when I bounced the server too often.

I'm definitely still a fan of Amazon's micro instances, but they do come with a bit of fine print to be aware of.