[Fixed] Parsing of filter|* seems broken

Posting here is no longer possible, please use the corresponding product forum.
Locked
famlam
Posts: 59
Joined: Sat Aug 07, 2010 2:06 pm

[Fixed] Parsing of filter|* seems broken

Post by famlam »

See first reply for a fix


At http://livemanager.eurovision.edgesuite ... index.html I was trying to whitelist the resource

Code: Select all

http://adserver.adtech.de/adserv|3.0|1215|4025198|0|1908|ADTECH;grp=%5Bgroup%5D
after receiving a FP report from a user (didn't find the issue though). I tried adding the following rule:

Code: Select all

@@||adserver.adtech.de/adserv|*
However, it doesn't match. It works fine if I change it into any of

Code: Select all

@@||adserver.adtech.de/adserv
@@||adserver.adtech.de/adserv|3
.

Since there is a * following the |, the | should be parsed as a literal |, and not as the end of an URL. Therefore I believe this is a bug.

https://reports.adblockplus.org/12928b4 ... f6c4e9c54b



sidenote: in a very short (and therefore unsuccessful, although the latter could also be because it's 3:30 AM) attempt to help find the issue, I noticed the following (filterclasses.js)

Code: Select all

// process separator placeholders (all ANSI charaters but alphanumeric characters and _%.-)
.replace(/\\\^/g, "(?:[\\x00-\\x24\\x26-\\x2C\\x2F\\x3A-\\x40\\x5B-\\x5E\\x60\\x7B-\\x80]|$)")
\x80 (unicode 128) however isn't an ascii character anymore :) (and a typo in charaters )
Last edited by famlam on Wed Jan 22, 2014 8:04 pm, edited 2 times in total.
famlam
Posts: 59
Joined: Sat Aug 07, 2010 2:06 pm

Re: Parsing of filter|* seems broken

Post by famlam »

I ran into this issue again, this time on a different site.
The cause is the following:

In /lib/filterClasses.js

Code: Select all

    // Remove leading wildcards
    if (source[0] == "*")
      source = source.substr(1);

    // Remove trailing wildcards
    let pos = source.length - 1;
    if (pos >= 0 && source[pos] == "*")
      source = source.substr(0, pos);

    source = source.replace(/\^\|$/, "^")       // remove anchors following separator placeholder
                   .replace(/\W/g, "\\$&")    // escape special symbols
                   .replace(/\\\*/g, ".*")      // replace wildcards by .*
                   // process separator placeholders (all ANSI charaters but alphanumeric characters and _%.-)
                   .replace(/\\\^/g, "(?:[\\x00-\\x24\\x26-\\x2C\\x2F\\x3A-\\x40\\x5B-\\x5E\\x60\\x7B-\\x80]|$)")
                   .replace(/^\\\|\\\|/, "^[\\w\\-]+:\\/+(?!\\/)(?:[^.\\/]+\\.)*?") // process extended anchor at expression start
                   .replace(/^\\\|/, "^")       // process anchor at expression start
                   .replace(/\\\|$/, "$");      // process anchor at expression end
The first two if-statements will change

Code: Select all

foo|*
*|foo
*||foo
into

Code: Select all

foo|
|foo
||foo
which will then be converted to the regexes

Code: Select all

foo$
^foo
^[\\w\\-]+:\\/+(?!\\/)(?:[^.\\/]+\\.)*?foo
instead of the regexes

Code: Select all

foo\|
\|foo
\|\|foo
Can you please fix this? Thanks.

(And while you're at it, please replace

Code: Select all

                   // process separator placeholders (all ANSI charaters but alphanumeric characters and _%.-)
                   .replace(/\\\^/g, "(?:[\\x00-\\x24\\x26-\\x2C\\x2F\\x3A-\\x40\\x5B-\\x5E\\x60\\x7B-\\x80]|$)")
by

Code: Select all

                   // process separator placeholders (all ANSI characters but alphanumeric characters and _%.-)
                   .replace(/\\\^/g, "(?:[\\x00-\\x24\\x26-\\x2C\\x2F\\x3A-\\x40\\x5B-\\x5E\\x60\\x7B-\\x7F]|$)")
to fix the typo in characters as well as that character x80 (= 128 = €) isn't ANSI anymore)


So the fixed code would be:

Code: Select all

    // Remove leading wildcards
    if (source[0] == "*" && source[1] != "|")
      source = source.substr(1);

    // Remove trailing wildcards
    let pos = source.length - 1;
    if (pos >= 0 && source[pos] == "*" && source[pos-1] != "|")
      source = source.substr(0, pos);

    source = source.replace(/\^\|$/, "^")       // remove anchors following separator placeholder
                   .replace(/\W/g, "\\$&")    // escape special symbols
                   .replace(/\\\*/g, ".*")      // replace wildcards by .*
                   // process separator placeholders (all ANSI characters but alphanumeric characters and _%.-)
                   .replace(/\\\^/g, "(?:[\\x00-\\x24\\x26-\\x2C\\x2F\\x3A-\\x40\\x5B-\\x5E\\x60\\x7B-\\x7F]|$)")
                   .replace(/^\\\|\\\|/, "^[\\w\\-]+:\\/+(?!\\/)(?:[^.\\/]+\\.)*?") // process extended anchor at expression start
                   .replace(/^\\\|/, "^")       // process anchor at expression start
                   .replace(/\\\|$/, "$");      // process anchor at expression end
User avatar
greiner
ABP Developer
Posts: 900
Joined: Mon Sep 03, 2012 5:29 pm
Location: Cologne, Germany

Re: Parsing of filter|* seems broken

Post by greiner »

Thanks for the valuable help. :) I'll have a look at it.

Update: It's under review: http://codereview.adblockplus.org/4922123285954560/
User avatar
greiner
ABP Developer
Posts: 900
Joined: Mon Sep 03, 2012 5:29 pm
Location: Cologne, Germany

Re: Parsing of filter|* seems broken

Post by greiner »

Locked