Log in

No account? Create an account
entries friends calendar profile Feren's dART gallery Previous Previous Next Next
dspam can blow me (You get pain for wanting to filter your mail) - Paint It Black
Living the American dream one heartbreaking piece at a time
dspam can blow me (You get pain for wanting to filter your mail)
The wildly popular anti-spam system dspam went into production on panther about 4 days ago. The tipping point came after I restored proper routing from "role accounts" (think webmaster, postmaster, etc) to the inbox of a Real Human (read: mine). It's astounding just how much junkmail these role accounts were getting. Since I was frustrated by getting something on the average of 30 spam letters every two hours via the role accounts, and since I'd had a few other users on the system complaining that their mail boxes were out of control too, I decided to take action. I took a cue from roho and tossed dspam onto the box, thinking "Hey... a bunch of people use this, it's trainable, and I'm not submitting to the horror that is the DNS RBL system (where collateral damage due to crazy people at the helm is a way of life and is somehow viewed as completely acceptable)."

Chose poorly or am dirt-stupid, those are my options. Maybe it's a little from column A and a little from column B.

My first mistake was trying to install from packages. The lone package I found and tried to install was several minor revisions behind, but that was better than all the rest, which were at least one major revision (2.x versus the current 3.x train) behind. I could get the library package to install, but the main binary package refused to install because it insisted I was missing several key PERL modules. I wasn't missing them, they were properly installed and working ... but try as I might I couldn't convince the damn RPM manager otherwise. So I ripped out the RPM and grabbed the source code.

Compiling was easy. You really can't get much easier than configure ; make && make install . Of course, that's where everything went horribly, shockingly wrong. To say that documentation on this thing is suboptimal is putting it nicely. Sure, all the collected knowledge the authors saw fit to include is sitting in the README, but it's very poorly documented knowledge that wanders all over the place and doesn't actually impart any wisdom about what you're trying to set up. For example, Section 3 of the README is titled PERMISSIONS. You might think this would suggest what files need what permissions. I certainly did. Of course, I was wrong. In this section I'm told I need to worry about what user/group permissions the Apache binary is running in. Which is great, but what specific files need what specific permissions and ownerships to be accessed by that user/process? All of them? Some? None? It's a total mystery! "Ability to execute the dspam binary" is technically accurate but unless you know what exact file that refers to (dspam.cgi? The dspam file installed in /usr/local/bin? Both? Neither?) it's absolutely useless. Eventually I managed to sort of fake my way through it. Mail was being delivered with !DSPAM tags but that really seemed to be about it. Hmm, okay, says here there's a web-based UI that's used to tune the system for the individual, so let's get that set up. Hey, a section in the README titled THE WEB UI. Rocking! Wait... what? Why is it talking about Procmail in here? Shouldn't that have been in an earlier section? Huh? That's it, and now I'm in "TESTING?" But you didn't tell me how to secure this interface or make it so users authenticate in?!? Okay, fine, turn to Google for help. Hey, a couple HOWTOs! And they're all... like... equally cryptic and inappropriate for my blindingly simple installation. Am I the only guy in the world who just has Postfix and wants this to go? Does everyone else have seventeen different MTAs running along with virus scanners and virtual mail accounts stored in MySQL?

More muddling, more cursing under breath. Read numerous "HOWTO" and "FAQ" documents on the web, cobble together something that looks to be sort of working. Mail is being delivered and the UI is now asking me for a username and password but it's all broken-looking and doesn't seem to be letting me flag messages as spam. Oh hey, okay, these files need to go in my website's document root. Hey, the UI is looking better now! Stats still all show N/A% though. Hmm, that's probably not good. A day later or so I made a breakthrough on the appropriate permissions so that statistics would report right.

Cool, great, now I'll add in the nightly cron maintenance routine. Oh wow, segfault and the stack is trashed. That's awesome! Hunt through google. One other person reports this problem. Author says "Your hardware or your GLIBC is broken, ask yourself why thousands of other people don't have any issue." That's a neat attitude! Too bad that's not the right answer... turns out, because I and this other poor shlub used the default storage system (hash_drv) the nightly cron routine that's documented as mandatory in the README (section 6, NIGHTLY MAINTENANCE AND HOUSEKEEPING CRONS) is totally the wrong one and the "normal" response of the dspam_clean program when faced with this storage method is to utterly shit its pants and die. Who the fuck writes software this way? Were they playing with the puzzle box from Hellraiser when they coded this? Oh wait, this is Open Source. Half-completed, painful, insane implementations are the norm around this group.

Okay, so now that I've determined it's dumping core because it's the wrong cleanup program for my foolishly-chosen default storage algorithm I ask myself: Self, what is the right program to run? Unsurprisingly enough, that's not documented at all in the README. I can only assume this is because nobody would ever be so foolish as to use the defaults on a first-time installation. I mean seriously, what was I smoking that made me think this would be written down somewhere? The "documentation" is useless. Neat. Faboo (You can tell my higher brain functions have eroded in the face of this insanity because I'm using words like "faboo"). More searching in local files and on the website proves the appropriate maintenance routine for the default storage drive is not documented ANYWHERE! Turn to Google to find the appropriate solution.

Around this time it hits me: Clearly I AM IN SOME SORT OF SPAM-FILTERING HELL. If this is not hell, then it must be some level of purgatory where I can atone for the sin of wanting my inbox to have some value again. I give up on the maintenace for the time being. Every time a user's directory and files are created by the system, it's giving the files and directories some whacky user:group memberships that prevent it from working right for that user. I go back to trying to make it so that everything is owned by the right user ID and group ID, along with having the right permissions. I think I've finally achieved balance, because as near as I can tell it's working: mail is being delivered, spam is being classified, it's making charts and calculating percentages.

Last night I decided to go back to the maintenance thing, because I'm thinking that running the system for three days without doing a "required nightly cleanup" is probably Bad For It. I feed my situation into Google. Google can't give me any answers either. I'm guessing that since I have some .css files in each user's data directory (not the type of CSS used for websites either, but if you thought that too don't feel bad: that threw me for a loop as well) and Google made mention in one result of "cssclean" I should probably try that. Dig through the source tree... victory! cssclean! Try running that. Hey, it didn't blow up! It didn't run, but it didn't blow up! Let's see if it has a man page. Okay, like everything else in this forsaken system, it has no man page. But it does understand "-h" on the commandline! WOOHOO! I'm finally able to clean up the .css files. Most of them shrink in size, which I'm lead to believe is a good thing. I go to bed with a feeling of accomplishment. The page on Google I found says I'll have to hack together some sort of script with "find" involved in order to automate this, but hey... I made it go and that's good enough for me to sleep on.

This morning I got up, checked dspam UI and immediately discover my history tab hasn't updated at all. There's no record of the 8 mails that were delivered to my inbox and the 19 that were sent to quarantine. The analysis is broken, all the catch/miss stats are back to N/A%. This brokenness leads me to believe something is really hosed. Again. Two steps forward, three steps back. I've sort of gotten used to this tune with this system. I begin digging around, starting immediately with the dspam user data directory since that's where 85%-95% of my problems have hidden in the past. It isn't long before I find that, in my zeal to do "cssclean" on all the .css files last night, I failed to notice that cssclean has a wonderfully undocumented (is anybody SURPRISED by this?) feature. The quirk? Once cssclean has been run on the user's .css file, it will be owned by root, which ... isn't what the system needs in order to record data. Alright, go back and fix the ownerships. Check in again. Nope, still broken. Oh, crap, somehow permissions got broken. Fix the permissions recursively. Discover that said recursive change of ownership has somehow screwed up the directory the userdirs reside in, so despite having the appropriate permissions on the files the programs still can't write to them. Fix that, reload the UI. Okay, data's being written again, the UI seems to be working again. I guess some sort of statistics file got exploded to because the percentages being reported don't make any sense.

People will tell me that every new implementation requires a shake-down period and that annoyances like this are bound to be encountered. True enough! My problem, and what is driving me out of my skull at this point, is knowing that a majority of these issues could be avoided if they'd just take the time to actually write documentation that didn't suck. When one understands what the guts are actually doing and know what is expected of them as the system administrator they can head a number of problems off at the pass. Screw it, I don't care, I'll hit the big jolly "Stats Reset" link and just put all the counters back to zero. At this point I'm going to call it a moral victory that things seem to be playing nicely again at all.

Until the next time I run the nightly maintenance program.

Cause the thing that's in effect

Tags: , ,
Current Mood: frustrated frustrated
Current Music: Neil Young - Computer Cowboy [Aka Syscrusher]

9 thoughts or Leave a thought
haikujaguar From: haikujaguar Date: November 27th, 2005 07:34 pm (UTC) (Link)
I remember handing off my somewhat peculiar implementation of a web-server to my next-in-line before heading on for greener pastures in another department. I wrote 20 pages (single-spaced) of documentation for it.

Someone should have kissed my stockinged toes for it, I swear.

Your icon is so perfectly appropriate for this entry. O_O
feren From: feren Date: November 28th, 2005 01:45 pm (UTC) (Link)
[Your icon is so perfectly appropriate for this entry]

My keyword for it is Technology makes me punchy.
yotogi From: yotogi Date: November 27th, 2005 08:17 pm (UTC) (Link)
hightensile From: hightensile Date: November 27th, 2005 10:30 pm (UTC) (Link)
points From: points Date: November 27th, 2005 08:43 pm (UTC) (Link)
Obviously, you selected a package written by the SpamLords. ;)
yakko From: yakko Date: November 28th, 2005 05:40 am (UTC) (Link)
While I'm not sorry I stopped using RBLs, I kinda have to take this as a cautionary tale when considering other alternatives. Fortunately, the problem (wow, 10 spams every week, only a couple which make it to my inbox) isn't out of hand just yet. In my case, upgrading the mail server (which then had a sendmail that supports greet_pause) solved most of the spam problem here. For now.

Deficient documentation sucks. I've dealt with that this past weekend, too, but for something that only affects me. It didn't suck any less, that's for sure.
fiskblack From: fiskblack Date: November 28th, 2005 06:33 am (UTC) (Link)
I told those bastards I like Michael Bolton's music.
ronbar From: ronbar Date: November 28th, 2005 07:43 pm (UTC) (Link)
I was going to suggest that you improve their documentation now that you've gone through the initiation rites. Then I thought the resulting stream of invectives against the developers and maintainers might be counterproductive. But it would be fun-- just imagine the changelog!
captain18 From: captain18 Date: November 28th, 2005 10:52 pm (UTC) (Link)
I hope it works for you from here forward. I swapped sendmail for exim with spamassassin (including the teergrube function) and while installation wasn't painful for it, I have seen my mailbox drop from 150+ mails a day down to about 25.
9 thoughts or Leave a thought