Blocking objects based on file size.

LAN_deRf_HA · Post by **LAN_deRf_HA** » Mon Sep 29, 2008 3:01 pm

I couldn't find anything on this, but I probably just didn't search well enough... Basically I want to know if I can block an image that's repeating all over a page with different file names and dynamically generated urls. I figure the only way to do it short of image recognition software is for it to be blocked based on it's exact file size.. can this be done with ABP?

Post by **Hubird** » Mon Sep 29, 2008 3:31 pm

No you can't block items based on files size but if you post a link to the site someone might be able to come up with something.

LAN_deRf_HA · Post by **LAN_deRf_HA** » Mon Sep 29, 2008 11:14 pm

It's a rather specific case, not sure you'll find ads doing anything like this. The site is photobucket via http://members.home.nl/bas.de.reuver/files/fusker.html. You use it to scan for photos in private buckets based on file naming schemes you've observed in photos the person has posted publicly. I've been running into very challenging naming schemes lately (first 3 digits increase conventionally, but have a randomized 5 digit number that follows) that require searches in the hundreds of thousands. It becomes impractical to scroll through all of it now that photobucket throws up a small generic image when the file you've requested never existed. To scroll through just 10 thousand of these dud images looking for one picture took me 10 minutes, any faster and my eyes wouldn't catch it flashing by, having just the real images you're looking for and nothing else would be vastly more efficient.

mrbene · Post by **mrbene** » Tue Sep 30, 2008 12:46 am

Nice hack!

('hack' indicating 'creative use of tools to bypass existing boundaries'). You can't know the size of a file from within the web browser before it's downloaded - though if you were to write your own you could probably inspect the HTTP headers and make a best guess based on 'Content-Length' header. I can't remember the full parameters of cURL, but I don't think it exposed the ability to abort based on HTTP headers.

What you can do is bypass the web browser completely and use cURL instead of the site you've linked, then do the inspection of the file sizes on disk - since you're downloading them anyways in the web browser, you're simply removing the CPU need to render all of 'em.