Geek: Web logs
Jul. 5th, 2006 03:57 pmSo it comes to pass (thick with irony) that I'm involved in the organisation's web logs and all that jazz. These logs are currently dumping out as text files from the proxy servers, three in all, so each day I get about 1.5Gb (no, really) of logs.
Currently I'm manually importing them into a MSSQL database and, for reasons of management's own, each day's logfile ends up in a separate table, ideal for difficult and tedious analysis. Clearly, I'll be automating that in just a few days' time.
But I mean, a gig and a half daily? That's half a terabyte in a year! I know we're enterprise-class, but that's a stuposterously humungous wodge of data. It's particularly unwieldy when (as has happened) I'm asked to mine it for, say, J Random User's access to see if he's been doing "anything naughty".
It strikes me that there's a trick we're missing. We need historical logs, because evidence of naughtiness is a long-term thing. But a terabyte database in a thousand tables is daft. How do real organisations handle this?
Currently I'm manually importing them into a MSSQL database and, for reasons of management's own, each day's logfile ends up in a separate table, ideal for difficult and tedious analysis. Clearly, I'll be automating that in just a few days' time.
But I mean, a gig and a half daily? That's half a terabyte in a year! I know we're enterprise-class, but that's a stuposterously humungous wodge of data. It's particularly unwieldy when (as has happened) I'm asked to mine it for, say, J Random User's access to see if he's been doing "anything naughty".
It strikes me that there's a trick we're missing. We need historical logs, because evidence of naughtiness is a long-term thing. But a terabyte database in a thousand tables is daft. How do real organisations handle this?
no subject
Date: 2006-07-05 03:20 pm (UTC)I'm going to go out on a limb and propose "not using MSSQL." Now, I don't do this sort of thing (and hope I never have to), but I would think that this kind of data would be more suitable to a database that can put all the log data in a single database table (or smallish set of tables -- not one table per day, which is clearly insane), which was designed such that pulling a single user's traffic for a week some time last year out of a couple of terabytes of table data wouldn't be no thang. "Old" data (defined as "data which we don't need to be readily available but which can't be discarded yet") could be moved out of the "current" tables and into some data warehouse arrangement (which I say in such a vague way because I, personally, know nothing about this "data warehousing" thing).
no subject
Date: 2006-07-05 03:31 pm (UTC)no subject
Date: 2006-07-05 03:43 pm (UTC)no subject
Date: 2006-07-05 04:08 pm (UTC)"Anything dodgy", incidentally, burns the eyes and soul. I have to surf so much porn to check tha tit *is* porn that I may as well be at home :)
no subject
Date: 2006-07-05 07:03 pm (UTC)I am, of course, a hand-wringing liberal. But I think the miniscule chance that I die in a tube explosion at the hands of lunatics is a risk worth paying to keep our society one in which our every move is recorded for later perusal to "find anything incriminating". I'm reminded of the hatchet job that is done on anyone the police shoot by accident. Or the similar thing Blunkett got the boys of the home office to do on Maxine Carr.
no subject
Date: 2006-07-05 07:06 pm (UTC)no subject
Date: 2006-07-06 08:09 am (UTC)