[ OpenInfo Home ] [ UP ]

Mailman Patch #444879 README.NOINDEXtags

If you are defining values for the ARCHIVE_INDEXING_ENABLE and 
ARCHIVE_INDEXING_DISABLE configuration attributes in mm_cfg.py you
may want to try and control the indexing activities of multiple search 
engines that you let access your mail archives.

At the time of writing this, the problem you face is that there is no standard 
tag defined to exert partial control over search engine indexing of a page. By 
this I mean a way of telling the search engine to index only a specified part of 
the page content. There is no formal or de facto standard equivalent to the 
robots property on the HTML 4.0 META tag e.g.
    <META NAME=robots CONTENT="noindex,follow">,
which gives whole page control with most search engines.

However, you should be able to put multiple start and stop indexing tags in the 
values you assign to the ARCHIVE_INDEXING_ENABLE/DISABLE strings in mm_cfg.py.

For example, some writers on the web suggest using <NOINDEX> and </NOINDEX> tags 
because they are recognised and honoured by a number of search engines.

The defaults recognised by htdig are actually HTML comments of the form 
<!--htdig_noindex--> and <!--/htdig_noindex-->

You could combine these in your mm_cfg.py file as follows:

ARCHIVE_INDEXING_ENABLE = '<!--/htdig_noindex-->\n</NOINDEX>'
ARCHIVE_INDEXING_DISABLE = '<NOINDEX>\n<!--htdig_noindex-->'

Most browsers and search engines should be happy with the results of this as, in 
general, they will ignore tags they do not understand and act on those they do.


Click to e-mail comments or complaints Last updated: 25-Aug-03 10:02 am

[ OpenInfo Home ] [ UP ]