[ OpenInfo Home ] [ UP ]

Mailman Patch #1483446 Daily mbox files for list mbox archives

Description

The daily mbox patch changes the way Mailman does its archive-to-mbox archiving, if that is enabled.

First, it it important to undertand that using this patch means you will be messing with your archived message data so IT IS VERY IMPORTANT TO BACK UP YOUR DATA before taking any irrevocable steps.

In the standard Mailman system a single UNIX mbox file called <listname>.mbox is maintained for each list in a directory $prefix/archives/private/<listname>.mbox and as each message is archived it is appended to that file. These files, for lists carrying a large amount of traffic, can, over time, become very unwieldy, presenting problems for disk space management.

The mailman daily mbox patch modifies Mailman's behaviour so that a sparse series of daily mbox files is used for archiving rather than a single mbox file. Each archived message is normally appended to a daily mbox file for the UTC date when the message is first archived.

The daily mbox files are named YYYYMMDD-<listname>.mbox where YYYY is the year, MM is the month (01-12) and DD is the day (01-31), and stored in the $prefix/archives/private/ <listname>.mbox directory.

The daily mbox files are sparse because there will only be mbox files for those dates, UTC, when messages are written to the archive.

Splitting the mbox archive into daily mbox files is intended to make the management of the disk space used for the mbox files easier. For instance, past daily files can be gzip'ed individually to save storage space. Policies to limit the time for which archive material is retained or held online can also be implemented more easily.

Amongst other changes, the patch modifies the the arch utility. After patching, by default, arch only processes the daily mbox files in $prefix/archives/private/<listname>.mbox as determined by pattern matching against the names of files in that directory, with gzip'ed daily mbox also files being recognized. The revised arch will process gzip'ed daily mbox files although it runs soemwhat slower when doing so. arch can process a combination of daily mbox files and other mbox files: see the script usage by running arch with the -h option.

A new utility $prefix/bin/split_old_mbox is provided for splitting a list's existing mbox file(s) into daily mbox files. There is no magic in the way this works and if you are turning to the patch because you are short of archive disk space you will still have to manage that problem. Splitting a very large mbox files takes a fair amount of run time and initially doubles the amount of disk space needed for the mbox - the space for the original file plus the space for the daily mboxes generated from it. However, once you have split the original file and deleted it to recover the space it occupied, you can recover more space by gzip'ing all but the most recent daily mbox files.

Note that split_old_mbox generates new daily mbox files and allocates messages to them based on the date in the UNIX From line immediately preceding the message in the mbox being split. It is thus important that your input mbox files are syntactically correct and can be parsed by an instance of Python's mailbox.UnixMailbox class. mbox files produced by Mailman as mail archives should be OK but if you trying to split mbox files from some other source you may need to run the cleanarch script or use other techniques to get a properly constructed UNIX mbox for input. For suggestions, see the Mailman FAQ or search the mailman-user archive.

Because of the way split_old_mbox allocates messages to the daily mbox files it is producing, there may be subtle differences in this allocation from any daily mboxes used as to it whether these were produced by Mailman when initially archiving an incoming message. During initial archiving message is allocated to the current daily mbox regardless of the value assigned to the UNIX From line written immediately prior to the message in the archive which may not give the same result when split_old_mbox is run and the UNIX From line controls the allocation.

You do not have to split a list's old mbox file which can stay in the same directory as the newer daily mbox files as they are created. Splitting a list's old mbox file is probably worth doing in those cases where it is too large and unwieldy for your disk space management policy.

Applicability

This patch is applicable to Mailman 2.1.8 and later.

Necessary Precursors

None

Changes Made

This patch modifies the Mailman/Archiver/Archiver.py file to alter the way messages are filed in list mbox archives to use daily mbox files instead of a single mbox file. This is discussed above under the Description heading.

The bin/arch script is modified to accomodate the changes in the list mbox archive files.

A new script, bin/split_old_mbox, is added which can be used to split existing mbox archives in a series of daily mbox files.

The file README.dailymbox is added to the Mailman build directory containing the information given beneath the Description heading above.

Applying the patch

Apply the patch from within the Mailman build directory using the command:

    patch -p1 < path-to-patch-file

Download Patch File

MM Version Download
2.1.12 Download
2.1.11 Download
Uses the same patch as MM 2.1.10
2.1.10 Download
2.1.9 Download
Uses the same patch as MM 2.1.8
2.1.8 Download


Click to e-mail comments or complaints Last updated: 09/07/2009 13:21

[ OpenInfo Home ] [ UP ]