[Fixed] Parsing of filter|* seems broken

famlam · Post by **famlam** » Sun Dec 09, 2012 3:35 am

See first reply for a fix

At http://livemanager.eurovision.edgesuite ... index.html I was trying to whitelist the resource

Code: Select all

http://adserver.adtech.de/adserv|3.0|1215|4025198|0|1908|ADTECH;grp=%5Bgroup%5D

after receiving a FP report from a user (didn't find the issue though). I tried adding the following rule:

Code: Select all

@@||adserver.adtech.de/adserv|*

However, it doesn't match. It works fine if I change it into any of

Code: Select all

@@||adserver.adtech.de/adserv
@@||adserver.adtech.de/adserv|3

.

Since there is a * following the |, the | should be parsed as a literal |, and not as the end of an URL. Therefore I believe this is a bug.

https://reports.adblockplus.org/12928b4 ... f6c4e9c54b

sidenote: in a very short (and therefore unsuccessful, although the latter could also be because it's 3:30 AM) attempt to help find the issue, I noticed the following (filterclasses.js)

Code: Select all

// process separator placeholders (all ANSI charaters but alphanumeric characters and _%.-)
.replace(/\\\^/g, "(?:[\\x00-\\x24\\x26-\\x2C\\x2F\\x3A-\\x40\\x5B-\\x5E\\x60\\x7B-\\x80]|$)")

\x80 (unicode 128) however isn't an ascii character anymore

(and a typo in charaters )

famlam · Post by **famlam** » Wed Jan 22, 2014 7:43 pm

I ran into this issue again, this time on a different site.
The cause is the following:

In /lib/filterClasses.js

Code: Select all

    // Remove leading wildcards
    if (source[0] == "*")
      source = source.substr(1);

    // Remove trailing wildcards
    let pos = source.length - 1;
    if (pos >= 0 && source[pos] == "*")
      source = source.substr(0, pos);

    source = source.replace(/\^\|$/, "^")       // remove anchors following separator placeholder
                   .replace(/\W/g, "\\$&")    // escape special symbols
                   .replace(/\\\*/g, ".*")      // replace wildcards by .*
                   // process separator placeholders (all ANSI charaters but alphanumeric characters and _%.-)
                   .replace(/\\\^/g, "(?:[\\x00-\\x24\\x26-\\x2C\\x2F\\x3A-\\x40\\x5B-\\x5E\\x60\\x7B-\\x80]|$)")
                   .replace(/^\\\|\\\|/, "^[\\w\\-]+:\\/+(?!\\/)(?:[^.\\/]+\\.)*?") // process extended anchor at expression start
                   .replace(/^\\\|/, "^")       // process anchor at expression start
                   .replace(/\\\|$/, "$");      // process anchor at expression end

The first two if-statements will change

Code: Select all

foo|*
*|foo
*||foo

into

Code: Select all

foo|
|foo
||foo

which will then be converted to the regexes

Code: Select all

foo$
^foo
^[\\w\\-]+:\\/+(?!\\/)(?:[^.\\/]+\\.)*?foo

instead of the regexes

Code: Select all

foo\|
\|foo
\|\|foo

Can you please fix this? Thanks.

(And while you're at it, please replace

Code: Select all

                   // process separator placeholders (all ANSI charaters but alphanumeric characters and _%.-)
                   .replace(/\\\^/g, "(?:[\\x00-\\x24\\x26-\\x2C\\x2F\\x3A-\\x40\\x5B-\\x5E\\x60\\x7B-\\x80]|$)")

by

Code: Select all

                   // process separator placeholders (all ANSI characters but alphanumeric characters and _%.-)
                   .replace(/\\\^/g, "(?:[\\x00-\\x24\\x26-\\x2C\\x2F\\x3A-\\x40\\x5B-\\x5E\\x60\\x7B-\\x7F]|$)")

to fix the typo in characters as well as that character x80 (= 128 = €) isn't ANSI anymore)

So the fixed code would be:

Code: Select all

    // Remove leading wildcards
    if (source[0] == "*" && source[1] != "|")
      source = source.substr(1);

    // Remove trailing wildcards
    let pos = source.length - 1;
    if (pos >= 0 && source[pos] == "*" && source[pos-1] != "|")
      source = source.substr(0, pos);

    source = source.replace(/\^\|$/, "^")       // remove anchors following separator placeholder
                   .replace(/\W/g, "\\$&")    // escape special symbols
                   .replace(/\\\*/g, ".*")      // replace wildcards by .*
                   // process separator placeholders (all ANSI characters but alphanumeric characters and _%.-)
                   .replace(/\\\^/g, "(?:[\\x00-\\x24\\x26-\\x2C\\x2F\\x3A-\\x40\\x5B-\\x5E\\x60\\x7B-\\x7F]|$)")
                   .replace(/^\\\|\\\|/, "^[\\w\\-]+:\\/+(?!\\/)(?:[^.\\/]+\\.)*?") // process extended anchor at expression start
                   .replace(/^\\\|/, "^")       // process anchor at expression start
                   .replace(/\\\|$/, "$");      // process anchor at expression end

greiner · Post by **greiner** » Thu Jan 23, 2014 5:46 pm

Thanks for the valuable help.

I'll have a look at it.

Update: It's under review: http://codereview.adblockplus.org/4922123285954560/

greiner · Post by **greiner** » Wed Jan 29, 2014 12:44 pm

Done: https://hg.adblockplus.org/adblockplus/rev/5268be5c2e85

Adblock Plus

[Fixed] Parsing of filter|* seems broken

[Fixed] Parsing of filter|* seems broken

Re: Parsing of filter|* seems broken

Re: Parsing of filter|* seems broken

Re: Parsing of filter|* seems broken