» Fig Leaf Software Home

We've Got You Covered.

Wednesday, April 28, 2010

Filtering SMB "index pages" from GSA results

Recently, I set up a GSA to crawl SMB/CIFS filesystem content from a NetApp filer. One odd thing that I hadn't seen before is that searches would show "index pages" corresponding to the NetApp directories, just like a web server might with automatic index pages enabled. There's no actual content to these SMB index pages, and the links they present don't work.

But you can't simply filter them from the crawl and index, because doing so will prevent the GSA from discovering the contents of those directories. So, to fix the problem, I added this pattern to the "Remove URLs" tab within each front end:

regexpIgnoreCase:smb://.*/$

This, of course, assumes you're using the "smb://" prefix for your GSA, instead of one of the newer alternative syntaxes.

No comments:

Post a Comment

About Us

Fig Leaf Software is an award-winning team of imaginative designers, innovative developers, experienced instructors, and insightful strategists.

For over 20 years, we’ve helped a diverse range of clients...

Read More

Contact Us

202-797-7711

Fig Leaf Software

1400 16th Street NW
Suite 450
Washington, DC 20036

info@figleaf.com