Darn all spammers to heck!

Several of you have emailed me about a spam message you’ve received with the subject line:
Fw: http://www.leoville.com/mt/archives/

I have also received that spam message. The company sending it out, www.trafficbbs.com, is spamming – they have nothing to do with me. Apparently they’re harvesting addresses from blog comments. I will attempt to get them to stop but I don’t have high hopes. The company slogan is:

Offer you great data of 50,000+ search engines & 120,000+ BBS!

Present you to a magic world of instant & effective online communication!

And there’s no phone number on the web page. Just a fax number.

These things happen all the time. The best defense is to not use your email address anywhere on the web. As long as a page is publicly accessible, a spammer can harvest the addresses.

Since most of the time they use automated programs to do the harvesting, it’s possible to use a human readable address that confounds the robots. Something like:

leo at (die spammers die) leoville.com

I’m sorry that that’s necessary, but that’s the way of the web, alas.

Referring to Referers

Leoville went down earlier today for a few hours. It was out yesterday for about an hour, too. I contacted my excellent web host, Nacio, and they brought it back. Here’s what they said…


We have noticed that 2 IP addresses are consistently opening and not closing connections to your website: 12.xx.xx.xxx and 172.xx.xx.xxx. On average, each IP will have 60 open TCP sessions on a 24/7 basis. Are you familiar with these IPs? If not, we may look in to blocking them, as their irregular activity may be part of the problem.

Also, we have been monitoring the disk usage on your site. Currently you are using 1.5GB total–roughly 1.4GB being your discussion board. As you are on a shared webserver–geared toward smaller 40 – 80MB sites–we were hoping that you could remove some content from your site. Would 500MB be sufficient, or do you need more?

I asked them to block the two IP addresses and that’s seemed to help. I’m going to have to cut back on the disk usage, too. Obviously the boards have gotten way out of control. I’m pruning messages older than 90 days and I’ll probably cut the max file size to 50kb. Sorry to have to do that, but I really would like to keep this site running!

On another note, I’ve been playing with Dave Winer’s new Radio blogging software and I have to say it’s wonderful. I’m sticking with Movable Type, but for anyone who wants to create their own web site without having to struggle with the tech this is it. It finally fulfills the web’s promise to be the people’s publishing platform. You can read the temporary blog I set up to play with it at weblogs.com. It literally took me 10 minutes to get it up and running. And it’s free for 30 days – $40 for a year including the web hosting. If you’ve been thinking about blogging try Radio.

While playing with Radio, I noticed that the link to referrers is misspelled on the admin page. No blame to Dave Winer for this. The misspelling dates back to the original HTTP spec which also misspells referrers as “referers.”

This ancient error caused me endless confusion when I was writing my own Perl referers routine (it’s running on Leoville now. To see the most recent 20 referring pages click here). The program failed at first because I kept spelling referrers correctly. It took me a while to figure out where I was going wrong.

But the misspelling poses an interesting problem. Do you perpetuate it, as Dave has done, in public, or do you continue to spell it correctly while using the non-traditional spelling inside your programs? I chose the latter route on the front page of Leoville, but I might be in the minority.

In fact, this is exactly how a language evolves. I suppose, in time, “referers” will become the correct spelling, all thanks to a small spelling error at the W3C. Even though programmers are notoriously bad spellers, I can’t think of another instance where a misspelling has become enshrined in a spec. Can you?

And you can bet I spell checked this post before submitting it.

Perls of Wisdom (not)

I use to program for fun, but I don’t have time (or enough concentration) to write anything substantial any more. It’s still fun to hack out one-liners from time to time though.
The other day I decided to keep track of my Amazon.com sales rank on the front page of Leoville. To do this I’d have to write a program in perl that the web server could call using CGI, the common gateway interface. The program would return the rank which the web server would embed into my page.

The first iteration of the program was pretty simple, thanks to a perl library called LWP. The library provides built-in routines to access web pages. Using the LWP routine “get” I can fetch the contents of the Amazon.com page, then use Perl’s built-in text search features to extract the ranking.

I wrote the program in a few minutes:


use LWP::Simple;
my $webpagetext;

# access Amazon web page
$webpagetext=get("http://www.amazon.com/exec/obidos/ASIN/0789726912/qid=1007181368/sr=1-6/ref=sr_1_74_6/104-8979567-7976756");

# find sales rank
$webpagetext =~ /(Sales Rank: )(d+)/;

# output sales rank
print "Content-type: text/htmlnn"; # this text is required for CGI output

print $2;

If you’re not familiar with perl a few things might need explanation. All the real work is done in the line…

$webpagetext =~ /(Sales Rank: )(d+)/;

In English this would read something like: search the contents of $webpagetext for the text “Sales Rank: ” followed by one or more digits.

The parentheses in the phrase (Sales Rank: ) (d+) tell Perl to group the results. Perl assigns the value in the first group to the variable $1, the second group to $2, etc. I’ll use $2 later to output the rank.

Finally I print the results to the console. CGI routes the output back to the web server which inserts it into the web page that called it.

I use Apache’s server-side includes (SSI) to call the perl program and embed the results of the program. On my system that means putting the line:

&lt!--#include virtual="/cgi-bin/ranking.pl"--&gt

into the web page. When the web server sees it, it calls ranking.pl and sticks the result into the page at that point.

So far, so good. I could run the program locally and it worked fine, but it wouldn’t work on my server. Turns out the LWP module was never installed. I wasn’t sure how to get around that until I installed Movable Type. This blog software uses several modules that aren’t part of my web host’s perl installation. But I learned I could put the needed modules in a directory on the web host and tell your program to look for them there. Thus adding the line:

use lib "/cgi-bin/mt/extlib/";

at the beginning of the program and storing the LWP::Simple module in the extlib directory, fixed the problem, and version 1.0 of my program was up and running. Worked great, too, until my book fell below 999 in ranking. Amazon displays larger ranks with commas, and my program didn’t consider that. I changed the search to include commas by substituting the regular expression [0-9,] for d:

$webpagetext =~ /(Sales Rank: )([0-9,]+)/;

and it was working again.

Incidentally, I work in perl on both Windows and Macintosh. On Windows, I use an excellent shareware editor from DZSoft. On Mac OS X I use BBEdit from Barebones Software. Both really speed up the development cycle by letting you run the program from within the editor, with built-in FTP uploading, and a perl reference.

No program is ever done, and neither was this one. Next time, how I extended it to keep track of the peak scores. (And maybe one of you perl experts can help me with a bug that’s really been buggin me.)

Those Curtains Have to Go!

Well I couldn’t resist and spent the last two days redesigning the Blog. I manhandled the elegant little templates Movable Type comes with until they looked just like the old blog. The voting has disappeared, but it’s nicer in other ways.
The archives link is back at the bottom, with month by month and day by day views. I’d like to pretty that up when I get a chance. The calendar feature in the next version of MT should do the trick.

I’ve added single entry view that has a back and next button, so you can go entry by entry reading my original post and the comments that followed. Get there by clicking any post’s title or the Comments link.

I am also categorizing the entries now. I’ll archive by category at some point, as well. For now, there are no categories for the old Greymatter and Blogger posts. I just lumped them all in the 2001 category. At some point I’ll categorize them, too.

There’s an e-mail notification feature, too, but I can’t figure out how to make it available to you on the page. It’s not documented anywhere, but as soon as I can I’ll make it possible for you to receive either new posts in their entirety or an e-mail notification.

As usual, I welcome any input or requests. Thanks for your patience during the transition. Now I know what vacations are for.

Viva Movable Type!

Moving Day

OK, I’ve moved the blog over to the new software, Movable Type. I was able to import all the previous entries through May. Still working on the rest. (UPDATE: got everything imported, including the original Blogger entries – all the the entries from the beginning are now in here. NICE!)
There are some differences. The main thing we’re losing is the voting on each entry. It’s fun, but it’s not something I’ll die without.

We gain a bunch more. Most importantly, I hope, more reliable and secure software. There are some other very nice features, including an improved user interface. I’ll be tweaking it over time. The biggest problem so far is that if your page is too narrow you lose the stuff in the gutter to the right (widen the window if you don’t see my picture to the right of this text). I’m going to tweak the templates to get them to fit Leoville’s dull but functional design better.

Comments are still enabled. I hope you’ll pass along your suggestions. Thanks for your patience during the transition.

Incidentally, the old Greymatter blog files will stick around indefinitely, so if you linked to them, the links will still work. I will remove the Greymatter software, however, since it poses a security risk.

New software

OK. I’m trying out some new software. This blog is running under Movable Type (http://www.movabletype.org).
It’s a little slicker and best of all, still under development. There are some kinks to work out, but I think I’ll be able to import all the old blog entries. Stay tuned!