If hits were monkeys, I’d have a lot of monkeys.

I don’t pay too much attention to my site’s traffic logs and, when I do, I satisfy my interest with ugly queries like this:

tail -15000 waldo_jaquith_org.log |cut -d " " -f 11 |grep -v waldo.jaquith.org |sort |uniq -c |less

I just got done running some site traffic through some nasty little shell scripts and realized that, last month, I reached the three million hit mark on this site since I last rotated out my logs, in August of 2003. I’ve tried to rotate out the logs every few million entries over the years. I figure I’m hovering around the ten million hit mark since January 1998, when I started this site. It doesn’t seem like much, since I clear 10M on nancies.org in a slow month, but I guess it’s a lot compared to most sites.

Anyhow, 3,000,000. That’s gonna take a lot of candles to celebrate.

Published by Waldo Jaquith

Waldo Jaquith (JAKE-with) is an open government technologist who lives near Char­lottes­­ville, VA, USA. more »

7 replies on “If hits were monkeys, I’d have a lot of monkeys.”

  1. That is not an ugly query. That is a beautiful little tail pipe (and, no, I don’t mean anything filthy). How else would you turn the last 15,000 lines of a log file into something enjoyable? Now what the heck are you logging that has over 11 fields?

  2. Now what the heck are you logging that has over 11 fields?

    Apache logs. Each hit looks like such:

    192.168.0.1 – – [05/Dec/2005:10:47:55 -0500] “GET / HTTP/1.1” 200 56564 “http://www.cnn.com/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322)”

    So if you want to get a unique listing of all referrers from the last 15,000 hits, you’d want to use that particular command. It would show that the above fictional hit came from the front page of CNN’s website.

    Do hits really mean anything at all?

    You’re absolutely right — of all the stupid metrics that are used to track website usage, hits are the dumbest. Unique daily visits are pretty lousy, too, but “hits” has been so misused that it’s definitely lost all meaning.

    But, for my purposes, “hits” are fine, because they’re measured by the number of logs in a log file. Each entry represents a single hit. To count the number of hits in a day, I just determine how many lines in the log file have that date. There’s really no way to track unique visits in any meaningful way using Apache log files, and certainly not in a manner that’s conducive to analysis with sed, awk, grep, cut, and uniq. :)

  3. Suggestion:

    tail -15000 waldo_jaquith_org.log |awk ‘{ print $11 }’ | grep -v waldo.jaquith.org |sort |uniq -c |less

    On my systems (Irix, HP-UX, and Mac OSX) awk is a lot faster than cut.

  4. I’m confused. A “hit” is not a visit by one person, is it? I thought a “hit” was the downloading of one segment of a website (and each visit by one person can generate 10 or 20 hits, depending on the complexity of the site).

    So “hits” can be misleading, maybe by a factor of 10 or so, to the person who thinks each one represents one human visit.

    Or do I totally misunderstand all this?

Comments are closed.