[ OpenInfo Home ] [ UP ]

Mailman Patch #835332: Stops bloat in pipermail article databases


The standard pipermail archiving code saves the body text, in HTML format, of every article in the -article database of each archived list. This can substantially bloat the size of these databases. Because they are pickled data structures, which are loaded into memory in their entirety when archiving operations for a list are being handled, this bloat can substantially prejudice archiver performance and in the limit, for lists carrying heavy traffic and/or receiving large text postings, bring archiving to a grinding halt.


This patch is avaliable for Mailman 2.1.3

The changes made by this patch have been incorporated into Mailman 2.1.4 and thus it is not required for that and later releases.

Necessary Precursors


Changes Made

This patch changes HyperArch.py and pipermail.py so that the data stored in the pipermail $archives/private/<listname>/database/<period>-article does not include the body text, in HTML format, of each article. This reduces the size of the -article database for each list. The benefits of this are most pronounced with high traffic lists and those to which large text postings are made.

The patch also adds a script $prefix/bin/rb-arch which will remove any body text, in HTML format, from existing -article databases; this junk HTML is no longer added when new articles are added to the databases but existing junk HTML is not deleted unless this script is run. The alternative is to run $prefix/bin/arch for a list.

Applying the patch

Apply the patch from within the Mailman build directory using the command:

    patch -p1 < path-to-patch-file

Download Patch File

MM Version Download
2.1.4 Patch incorporated into Mailman source and no longer required.
2.1.3 Download

Click to e-mail comments or complaints Last updated: 1-Jan-04 9:12 am

[ OpenInfo Home ] [ UP ]