"Martelada" is the Portuguese word for what browser engines have to do to deal with the Web. The Web is broken in ways no one can imagine. I recently got involved in writing a small program to clone web pages and had to deal with rudimentary manual html parsing. Did you knew that:
1. Yahoo!'s Homepage refers to URLs in anchors without parenthesis or apostrophes just to spare a few bytes (and then they have a huge 30k banner at the same time) in the form <a href=/images/ihavestandards.gif> ?
2. Saw this on Slashdot. Instead of http://poke.w3.org/ you can just link to //poke.w3.org and your browser will understand that //=http:// ? Will work with parenthesis, apostrophes or nothing.
3. In Javascript or CSS (a newly created standard) you can easily see url(" url(' or url(//this.works.com and they all work, apparently.
This may be common sense to any HTML parser developer or experienced web designer, but I was kind of shocked. And I'm sure there's hundreds more.
Obviously you didn't get to parse the javascript links
I guess web 2.0 (which behaviour cames to mind) will make this even harder.
The doctype of the pages was at least consistent with the quirks mode?
Browser's mimetype beahviour, specially in iframes and link hrefs, etc, is another interesting discussion topic. If count 3 if...elses in that layer in my code
And we also have the classic mimetype's ignore mode for images where you can have a .gif file served which is in fact a JPEG. Even if the server sends 'image/gif' and the file extension is .gif the browser will still not trust you and will read the first bytes of the image just to make sure, and will display the JPEG.
I wonder the spaghetti code on the engines to deal with this and much more. Hope the browser's industry learned their lesson. Let's see how IE7 complies.
I guess web 2.0 (which behaviour cames to mind) will make this even harder.
The doctype of the pages was at least consistent with the quirks mode?
Wednesday, January 25. 2006 at 09:25 (Reply)
And we also have the classic mimetype's ignore mode for images where you can have a .gif file served which is in fact a JPEG. Even if the server sends 'image/gif' and the file extension is .gif the browser will still not trust you and will read the first bytes of the image just to make sure, and will display the JPEG.
I wonder the spaghetti code on the engines to deal with this and much more. Hope the browser's industry learned their lesson. Let's see how IE7 complies.
Wednesday, January 25. 2006 at 11:18 (Link) (Reply)
http://code.google.com/webstats/index.html
Wednesday, January 25. 2006 at 11:25 (Link) (Reply)