"Match whole word only" ?
Posted: Tue Sep 05, 2006 6:22 pm
Hi!
This goes to mainly to Wladimir, but I'd like everybody to contribute their thoughts about this:
First of all, great work! I just read your article on http://adblockplus.org/blog/investigati ... algorithms
and here is my idea:
I want a possibility to prevent filters like "ad" from blocking "head" etc.
At the moment, the only way to do that seems to be using several filters like this:
or a (slow) regular expression of the form
It should be possible to add a "match whole word only" checkbox that can be checked for each filter individually. Now, adblock plus could take the word "ad" and -- if the checkbox is enabled -- convert it to a reg.exp. like the above, or it could match "ad" with the Boyer-Moore algorithm, and then check the characters to the left and to the right.
I know that this will slow down the algorithm, but I think it's worh it.
Another method could be: Split each URL into its components, i.e.
becomes
and apply the Boyer-Moore algorithm to the above list, for the filters with "whole word only" enabled. This will roughly double the execution time, I think.
Still, it could be very useful.
What do you think?
This goes to mainly to Wladimir, but I'd like everybody to contribute their thoughts about this:
First of all, great work! I just read your article on http://adblockplus.org/blog/investigati ... algorithms
and here is my idea:
I want a possibility to prevent filters like "ad" from blocking "head" etc.
At the moment, the only way to do that seems to be using several filters like this:
Code: Select all
.ad.
/ad.
/ad/*
/ad-
.ad-
.ad_
/ad_
Code: Select all
/([^\w]|_)ad([^\w]|_)/
I know that this will slow down the algorithm, but I think it's worh it.
Another method could be: Split each URL into its components, i.e.
Code: Select all
http://ad.server.com/?showad.php&ad=test
Code: Select all
http
ad
server
com
showad
php
ad
test
Still, it could be very useful.
What do you think?