A question about regular expressions and the wildcard *

Posting here is no longer possible, please use the forum of a filter list project, such as EasyList
User avatar
rick752
Posts: 2709
Joined: Fri Jun 09, 2006 7:59 pm
Location: New York USA
Contact:

Post by rick752 »

I guess the question for Wladimir would be:

"Does a regexp have to read a blockable address in its entirety before it comes to a solution .. or does it stop processing when it reads enough to make a determination"?

I would think if it is #1, then a wildcard would be faster because it would have to read the entire address in both instances but would not have to count spaces.

If #2, then a more definitive 'bridge' (like a "{2}") would terminate the matching at that point and would be faster.

Personally, in either instance, I'm sure that the difference would be negligible.

(Wladimir will probably yell at me for "seeing what I want to see" again :lol: )
Wladimir Palant

Post by Wladimir Palant »

The FAQ certainly needs updating to make this very clear. When Adblock Plus has a simple filter (not a regular expression) with at least 8 unbroken characters (e.g. "*adserver*" but not "*ad*server*" and not "*adserv*") it can process this filter MUCH faster, it almost doesn't take up any time at all. For that reason using any filters other than those that can be processed in this way should only be done when really necessary.

As to processing of regular expressions: it is difficult to make any performance predictions here, the regexp engine is a complicated beast. It is safe to say that "/.{2}/" works a little faster than "/.*/" but anything more complicated has to be measured. E.g. you can try these two addresses:

Code: Select all

javascript:var start = new Date().getTime();for (var i = 0; i < 100000; i++) /d.*dd/.test("asdfasdfasdfasdfasdfasdf"); alert(new Date().getTime() - start)

javascript:var start = new Date().getTime();for (var i = 0; i < 100000; i++) /d.{2}dd/.test("asdfasdfasdfasdfasdfasdf"); alert(new Date().getTime() - start)
For me the first shows 2100, the second 900. This confirms the theoretical speculations - but note that there wouldn't be any difference at all if the string tested didn't have the letter d (the beginning of our regexps) in it.
User avatar
Lucas Malor
Posts: 72
Joined: Wed Aug 23, 2006 7:34 am
Contact:

Post by Lucas Malor »

Thanks to your JS example code I've done some other tests. It seems to be ok. I think I'll convert my filters :)

Another question: can I write a filter like this?

filter_expression_1 AND filter_expression_2

For example, I would change this filter

Code: Select all

|http://www.badongo.net/images/swf/badongo_banner_*.swf|$match-case
in a filter like this:
Wladimir Palant

Post by Wladimir Palant »

Short answer: no. Slightly longer answer: I don't see what this can be good for.
User avatar
Lucas Malor
Posts: 72
Joined: Wed Aug 23, 2006 7:34 am
Contact:

Post by Lucas Malor »

I think that I could tell to Adblock to search only at the start and at the end of the URL in some filters. I don't know if a regular expression like this one: /^abc*efg$/ does the same thing. Anyway I didn't found a method to execute two search expressions in a single test() in Javascript, and executing two tests is more slow.
Wladimir Palant

Post by Wladimir Palant »

The regular expression would be /^abc.*efg$/ (note the dot), but it is better to use a simple expression here that does exactly the same: "|abc*efg|"
Locked