[ OpenInfo Home ] [ UP ]

Mailman patch #820723: Mailman/pipermail/MHonArc integration patch

Objectives and Description

This patch tightly integrates the MHonArc mail-to-HTML convertor (Earl Hood's description, not mine) with Mailman and its internal pipermail archiving code. The purpose of the patch is to produce a fusion of (hopefully) the best feature of pipermail and MHonArc for handling Mailman mailing list archives.

Although pipermail has a number of weaknesses it has some good features, including structuring mail archives by per-list configurable periods and supporting private and public archive access.

MHonArc is represented by its proponents as being superior to pipermail in handling attachments and multipart MIME messages, and being able to build/rebuild archives from large UNIX mboxes.

In contrast, pipermail's "database" creaks and groans with large mboxes and lists with high levels of traffic, and it collapses into a heap when it is asked to rebuild from very large mboxes. The latest pipermail handling of attachments and MIME is OK'ish but appears to be weaker than MHonArc.

With MHonArc, I could not quickly find an obvious way to get the HTML archives it is generating into optional Yearly/Monthly/Weekly/Daily periods as pipermail does. It is clearly possible (for instance, see http://www.mail-archive.com/mailman-users%40python.org/) but how ...

List archive privacy can be important for some user communities and the Mailman/pipermail solution works well.

I also wanted searchable archives and saw no reason not to continue using my Mailman/HTdig integration patch for per-list archive searching.

Finally, I thought it would be neat to make choosing whether MHonArc or pipermail should generate a list's archive pages a per-list configuration option and have Mailman's archive builder $prefix/bin/arch work whichever choice was made for any list.

The upshot was this patch, which is a hack, but one which works. The code is not pretty but I defy even the best cosmetic surgeon to produce movie star looks when he is grafting a wart onto a boil. The code works and can fill my needs until Mailman version 3 comes along incorporating a wizzy new archiver and archive search capability ...

Alternative ways of using MHonArc and HTdig in conjunction with Mailman exist and doubtless some would argue are superior. There is no compulsion to use this patch; it is just another option you can choose from.

Implementation Details

The implementation operates with pipermail in charge of archiving. That is, the pipermail code generates the top level archive TOC pages for each list, organises each list's archive directory structure and sorts out the archving period stuff. But when it comes to generating the message and index pages of the HTML archives the use of pipermail or MHonArc depends on the option set for any list via the Archiving Options page of the admin web GUI.

For lists set for MHonArc archiving, pipermail uses an instance of MHonArc instead of its own code to generate the HTML message and index pages. For such lists, pipermail maintains only a vestigial database so that the problems of large pipermail databases is avoided.

The organization of the archives on disk is pretty much the same for both pipermail'ed and MHonArc'ed lists. The top level list TOC is the same as for normal Mailman as is the per-period sub-directory structure and per-period text (mbox) archive. The naming of message files is different for MHonArc as is the storage of extracted attachment files. For extracted attachments pipermail uses a separate directory structure while MHonArc puts them in the same directory as the messages. Actually, this is a win for MHonArc because the URL's on message pages linking to extracted attachments that it generates are relative URL's. In contrast, pipermail generates absolute URLs. This means that, for pipermail generated message pages, if a list archive is changed from private to public or vice versa, the links on messages pages are wrong and can only be corrected by rebuilding the entire list archive. With MHonArc'ed archives list privacy changes are a non-event, with Mailman's creation and deletion of the symlink to the list archives in $prefix/archives/public/ doing all that is needed.

$prefix/bin/arch works as normal, regardless of whether pipermail or MHonArc is generating message and index pages. Indeed, having changed the archiver option for a list, $prefix/bin/arch --wipe must be run to have the change take effect; this is because of the incompatibility between the per-list database, message file naming and attachment storage schemes used by the two options.

Because I like the date/thread/subject/author indexes produced by pipermail, MHonArc (as used by this patch) does the same thing. The layout of message and index pages generated by MHonArc is controlled using three MHonArc resource configuration files (MRCFs), which must reside in Mailman's template directory structure. The MRCFs are selected for a list in the same way that any other template file is associated with that list, that is, the template hierarchy is searched. Before pipermail invokes MHonArc to handle a messages or a group of messages, it selects the which MRCFs apply and passes these as parameters to MHonArc. The default MRCFs reside in $prefix/templates/en as mhonarc.mrc, author.mrc and subject.mrc. The look and feel the default MRCFs produce is similar to that of the archive and index pages for regular pipermail'ed archive pages. The only thing that is different with the operation of the templates for MHonArc is that there is no variable substitution performed by Mailman; instead some useful values (such as list name and archive name) are passed to MHonArc using environment variables which can be referred to in the MRCFs to characterize the pages being generated.

Whether the MRCFs are passed to MHonArc on the command line when it is invoked depends on:

  1. Whether the message(s) being passed to MHonArc is the first one for a new period; if it is then they are, regardless of any other factors.
  2. The value of the MHONARC_SAVE_RESOURCES MM config variable (Default value is True). This config variable tells pipermail to pass MHonArc either the -saveresources or the -nosaveresources command line option. If MHONARC_SAVE_RESOURCES is True then the MRCFs are not passed as command line options (exluding the situation in [1]), because MHonArc will already have the template information from the MRCFs in its database.

The downside to MHONARC_SAVE_RESOURCES being True is that new messages for an existing archive period will continue to use the existing templates despite changes of/to the mhonarc.mrc, author.mrc or subject.mrc applicable to the list concerned, even though mailmancntrl -restart may have been run. To get the revised templates into use, run bin/arch --wipe for the list.

Following installation of this patch, the availability of the features is provides is dependent on the installation of MHonArc (see Necessary Precursors below) and the setting the value of MHONARC_ARCHIVER_PATH to a non-empty string.

MHonArc has be installed on the Mailman server. I found installing it into $prefix/mhonarc worked for me. There is a new Mailman configuration variable, added to $prefix/Defaults.py by the patch, which tells Mailman where MHonArc is installed. This was the sum total of setup I had to do for the MHonArc installation.

MHONARC_ARCHIVER_PATH = os.path.join(PREFIX, 'mhonarc', 'bin', 'mhonarc')

The patch adds an option, with radio buttons to choose pipermail or MHonArc as archiver for a list, to the Archiving Options page of the web admin GUI. This option is only displayed after a non blank value has been assigned to MHONARC_ARCHIVER_PATH. The default value for which archiver to use is set by a new Mailman configuration variable added to $prefix/Defaults.py by the patch:

# Which archiver to use by default to generate archive pages: 
# 0 - pipermail 
# 1 - mhonarc
DEFAULT_WHICH_ARCHIVER = 0

When a new list is created and when/if archiving is enabled for it, it will use the archiver specified by DEFAULT_WHICH_ARCHIVER at the time of the list's creation. Lists with existing archives that pre-date adoption of this patch will continue to be pipermail archives unless their choice of archiver is changed via the web admin GUI.

As noted above, when the archiver nominated for a given list is changed the change will not take effect until $prefix/bin/arch --wipe is run for the list. This is not done automatically at the time the option is changed using the web admin GUI for a good reason: if the list has a large mbox then the amount of processing involved in rebuilding the archive is inappropriate for a process being run "in-line" as part of a web server transaction.

With pipermail in charge, MHonArc only gets to see the archive a period at a time. In practice, each period of a list's archive is what MHonArc sees and it maintains a database and index pages for each of them quite separately. This means that thread and date links generated by MHonArc terminate at the boundary of the archiving period. The only index linking all of the period archives for a list is the tope level TOC page for a list's archives which is generated by pipermail in the normal way. Thus far, this characteristic has not been a problem for me and I've not given much thought to changing it.

When $prefix/bin/arch is run on a list configured to use MHonArc, and because of the period archive approach, pipermail passes messages for the same period to MHonArc for processing in temporary mbox files. The way this is done means that the memory demand made by Mailman/pipermail in handling a large mbox is no bigger than the biggest message in the input file being processed. By contrast, when archiving a single message, for the ArchRunner say, it is passed via stdin to MHonArc.

My Mailman-HTdig integration patch works as normal with archives generated by both pipermail and MHonArc. Just install and use it in the same way as usual.

Mailman's pipermail features and configuration for obscuring mail address in the HTML archives to thwart email address harvesters are not automatically applied to MHonArc generated pages. Features provided by MHonArc can be used by making modifications to the MHonArc resource configuration files that control archive index and message page generation. You should take copies of the default MRCFs installed in $prefix/templates/en, modify them and add them to a site, virtual host or list specific sub-directory of the template hierarchy.

The default MRCFs have embedded in them default indexing control directives for HTdig, in anticipation of HTdig being used for archive search. With pipermail generated HTML pages the effect of changing the value of Mailman config variables ARCHIVE_INDEXING_ENABLE and ARCHIVE_INDEXING_DISABLE in $prefix/Mailman/mm_cfg.py is dynamically incorporated into the pages. With MHonArc generated pages this must be achieved by copying and modifying the default MRCFs installed in $prefix/templates/en and adding them to a site, virtual host or list specific sub-directory of the template hierarchy.

Things Still To Do (Maybe)

  1. Integrated support for different languages
  2. Obscuring mail address in the HTML archives

Applicability

This patch is applicable to Mailman MM 2.1.3 and later.

Necessary Precursors

The following patch must be applied to Mailman before applying this patch:

  1. Mailman Patch #760567: both this patch and #760567 update the version number of the Mailman list database in order to add extra attributes to it.
    Note that this means that this MHonArc integration patch and #760567 will have to be updated the next time the standard DATA_FILE_VERSION value is updated in $prefix/Mailman/Version.py
  2. MHonArc has to be installed on the Mailman server machine. My development and testing has been done with MHonArc 2.6.8. When I installed MHonArc 2.68, I just followed the instructions and the initial terminal dialogue looked like this (where $prefix is the value of the Mailman --with-prefix ./configure option):
    > install.me
    Checking dependencies:
            Fcntl ......................... ok
            File::Basename ................ ok
            Getopt::Long .................. ok
            Symbol ........................ ok
            Time::Local ................... ok
    Pathname of perl executable: ("/usr/bin/perl") 
    Directory to install executables: ("/usr/bin") $prefix/mhonarc/bin
    Directory to install library files: ("/usr/lib/perl5/site_perl/5.6.1") $prefix/mhonarc/lib/perl5/site_perl/5.6.1
    Directory to install documentation: ("/usr/doc/MHonArc") $prefix/mhonarc/doc/MHonArc
    Directory to install manpages: ("/usr/share/man") $prefix/mhonarc/share/man
    You have specified the following:
            Perl path: /usr/bin/perl
            Bin directory: $prefix/mhonarc/bin
            Lib directory: $prefix/mhonarc/lib/perl5/site_perl/5.6.1
            Doc directory: $prefix/mhonarc/doc/MHonArc
            Man directory: $prefix/mhonarc/share/man
    Is this correct? ['y'] 
    ...
    

Changes Made

See the Description and Implementation Details above.

Applying the patch

Apply the patch from within the Mailman build directory using the command:

    patch -p1 < path-to-patch-file

Download Patch File

MM Version Download
2.1.12 Download
2.1.11 Download
2.1.10 Download
2.1.9 Download
2.1.8 Download
Uses the same patch as MM 2.1.7
2.1.7 Download
2.1.6 Download
2.1.4 Download
2.1.3 Download

History

Patch Version Changes
2.1.12-0.1
  1. Updated for MM 2.1.12 compatibility.
2.1.11-0.2
  1. Amends corruption in source of published 2.1.11-0.1 patch.
2.1.11-0.1
  1. Updated for MM 2.1.11 compatibility.
  2. Code that spawns MHonArc now sets umask prior to running the sub-process to try end ensure file permissions are generated correctly by MHonArc.
2.1.10-0.1
  1. Updated for MM 2.1.10 compatibility.
2.1.9-0.1
  1. Updated for MM 2.1.9 compatibility.
2.1.7-0.1
  1. Updated for MM 2.1.7 compatibility.
2.1.6-0.3
  1. Corrects a long standing omission in the code of Mailman/Cgi/create.py which fails to get the initial setup of lists created through the web quite right. The leads to spurious errors being logged on message archiving until bin/arch --wipe is run for such a list. Lists created with bin/newlist did not have this problem.
2.1.6-0.2
  1. Corrects error in code of bin/arch [an omitted mlist.Save()] introduced in
    patch 2.1.6-0.1.
2.1.6-0.1
  1. Updated for MM 2.1.6 compatibility.
2.1.4-0.1
  1. Updated for MM 2.1.4 compatibility.
2.1.3-0.6
  1. Added MHONARC_SAVE_RESOURCES config variable to Defaults.py.
  2. Associated changes in Mailman/Archiver/pipermail.py
2.1.3-0.5
  1. Fixed minor HTML syntax error in mhonarc.mrc and author.mrc that affected date and author index pages.
2.1.3-0.4
  1. Changed default value of MHONARC_ARCHIVER_PATH in Defaults.py to the empty string ''
  2. Changed behaviour so that if MHONARC_ARCHIVER_PATH is the empty string, the ability to change which archiver to use on the Archiving Options pages of the web admin GUI of lists is not displayed. In effect until the configuration variable is defined the installllation of this patch is not seen.
2.1.3-0.3
  1. Change to tolerate MHonArc prematurely (and validly) closing the pipe through which it is receiving a message from pipermail.
2.1.3-0.2 First 'official release'
2.1.3-0.1 Original 'unofficial release'


Click to e-mail comments or complaints Last updated: 09/07/2009 13:29

[ OpenInfo Home ] [ UP ]