Skip to content

WAMP wordpress permalink problem

2012 November 8
by anon

I just started working on my first mobile theme and using  a new jumpmobi theme which seems the best I have ever seen. I will be using it co-existent with an existing desktop site and use a sub directory for it.

Not to load the existing operational website I am using WAMP which I find is really helpful and which I prefer to XAMMP. All going well until I tried to setup the permalinks which just didn’t work and would only operate using the default setting http://localhost/mobile/?p=123.

After much searching around I came across the fix which worked for me and which I should have realised from the start and here it is….

In WAMP the rewrite_module needs to be on and I believe in default it’s off.

  1. click on the WAMP icon in your task bar
  2. navigate to: Apache>Apache modules>
  3. scroll down to rewrite_module and click it.

All your pages and posts should work with your selected permalink setting.

 

Fatal error: Allowed memory size of 33554432 bytes exhausted

2012 August 17

I recently moved a WP site onto a good free host facility as they really didn’t want to pay for hosting on what was a fun blog for them.

Once up and running without much problems I realized that the dashboard was full off error messages which were all the same

Fatal error: Allowed memory size of 33554432 bytes exhausted

Most of the information boxes in the dashboard had failed with the code so I had to start searching around for what may have happened here.

I finally managed to find the fix by searching around and the best answer came from a WP moderator Sam Bolt

The bottom line was to do one of the following starting with the first -

  1. Find your wp-config.php file and add the following line to it : define(‘WP_MEMORY_LIMIT’, ’64M’);
  2. If you can access to your PHP.ini file  then change the line in PHP.ini
    (If your line shows 32M try 64M)
    memory_limit = 64M ;
    Maximum amount of memory a script may consume (64MB)
  3. If you are unable to get to your PHP.ini file then try adding this to an .htaccess file:
    php_value memory_limit 64M
  4. Finally speak with your host

Mine was fixed using the first fix define(‘WP_MEMORY_LIMIT’, ’64M’); let’s hope you are as lucky

Finally you may also be able to fix this problem by increasing the memory limit in your setting.php file, however, as this change will be lost with any new WP upgrade I would only entertain this one as a last resort fix.

 

WP-DBManager Error

2011 September 7
by anon

I have just migrated a wp site to a new host and all went well.

  • I created a new database and imported the db backup from the old host.
  • I then uploaded the home directory files which were backed up from the old host.
  • I changed the “wp-configuration.php“  file settings  to reflect the new database, user id and password for the new host.
  • I changed the DNS settings on my domain name provider to reflect my new host details.

All went really well, in fact it was the best migration to date and the new DNS settings switched in within an hour which was great.

However, when I checked that all the plugins etc. were functioning correctly I got the following error message on all my admin pages

Error Warning: Your backup folder MIGHT be visible to the public

The suggested fix from the plugin stated as follows

To correct this issue, move the .htaccess file from wp-content/plugins/wp-dbmanager to /home/xxxxxxx/public_html/wp-content/backup-db

Well, I’d seen this message many times before and the move as requested had always worked before so thought I must have dropped the “.htaccess” file in the move. Remember that the original file is “htaccess.txt” so you also have to change it to “.htaccess” when doing the move (or copy).

No, it was still there so I went back to my original host, zipped up the whole “public_html/wp-content/backup-db” directory and replaced the new host “backup-db” directory with it just in case something was corrupted in the migration .

No, the error message was still there so I realised I had to dig somewhat deeper.

Permissions on the directory? No all ok

I then looked at the actual setup of the plugin using the “DB Options” which you rarely visit once the plugin is configured.

That’s when I realised that there was an area that was still configured with data for the old host – my account name reflected as a directory. This is to be found in the  “Path To Backup:” setting.

Mine was set with the old path for my previous host
/home/domain/public_html/wp-content/backup-db”

I changed it to reflect my new host settings like this
/web/docroot/1111/domain.com/htdocs/wp-content/backup-db”

After the change and reseting the page, the message disappeared.

I tested full functionally of the plugin without any problems and I would add that this plugin from Lester ‘GaMerZ’ Chan  is still great, I have been using it now for many years, thank you Lester.

Yoast WordPress SEO export settings problem

2011 August 11
by anon

I have found that one of the best SEO plugins for Word Press is  WordPress SEO which is yet another of  Joost de Valk’s many good WP plugins.

I started to use it some months ago and apart from the learning curve that always has to be passed through I found it quite intuitive and documentation enough to do the job. All on all one of the best “free” SEO  packages around.

Today I was faced with opening two new WP sites and I’m designing the themes for each so I just wanted a few plugins to get me going. I loaded WordPress SEO and started to configure the standard setting which are pretty much the same for all my sites and as I was tidying up thought I may as well use the settings export facility so I can import for the second website.
Unfortunately when you try to export it comes up with an error

Warning: chdir() [function.chdir]: No such file or directory (errno 2) in /home/xxxxxxx/public_html/www.yyyyyyyyyyyy.com/wp-content/plugins/wordpress-seo/inc/wpseo-non-ajax-functions.php on line 90

This is when I remembered that I’d tried this some while ago with the same result – what a bummer. This time I resolved to try and find a fix and not being a PHP expert this meant searching the internet.

This time I struck gold after about half an hour searching.

I take no credit for the work but I did what the patch suggested and it works a treat but remember his warning “Try out at your own risk”.

The patch applies to the file located in the plugin directory  “wordpress-seo/inc/wpseo-non-ajax-functions.php”

This is the fix verbatim:-

There is a subtitle difference in that the dir must be an OS path when creating the file and a url when returning the file for download.
There should also be a leading / on settings.ini and settings.zip

Here is my workaround. Try out at your own risk.
NOTE: I have not tested the resulting import yet.

Additions market with > starting from line 80. Mods marked with |

> $upload_dir = wp_upload_dir();
> $wpseo_upload_path = $upload_dir['path'];
> $wpseo_upload_url = $upload_dir['url'];
| if ( !$handle = fopen( $wpseo_upload_path.’/settings.ini’, ‘w’ ) )
die();

if ( !fwrite($handle, $content) )
die();

fclose($handle);

require_once (ABSPATH . ‘wp-admin/includes/class-pclzip.php’);

| chdir($wpseo_upload_path);
$zip = new PclZip(‘./settings.zip’);
if ($zip->create(‘./settings.ini’) == 0)
return false;

| return $wpseo_upload_url.’/settings.zip’;

And the guy we should all thank for it is known as setuptips  and a member of the WordPress.org Forums.

 

 

7 Steps to Building the Right Social Media Connections

2011 January 17

By Angelique L. Rewers (c) 2011

Once upon a time, it seemed as though the number of “followers” or “connections” a person had on social networking sites like Twitter, Facebook, and LinkedIn was akin to the number of votes they had for homecoming king or queen. It was the adult version of a high school popularity contest.

However, as these websites have now matured – and as entrepreneurs and business owners have figured out how and how not to use them – most everyone has come to realize that it’s not about the number of connections you have, but the number of right connections.

Just like in the real world, you want to make sure you’re making the most of your networking time by connecting with those people that have similar or complementary interests or expertise and, therefore, create a mutually beneficial relationship – particularly when sites like Facebook limit your number of friends to only 5,000. In fact, many people who have large followings of the wrong people are taking the drastic step of deleting their connections and starting over from scratch.

Whether you’re just getting started building your social media network or you’re a seasoned pro, who’s thinking about doing a major overhaul, here are seven steps you can take to help you build the right connections.

1.) Start With the People You Know. If you’re still not using social media and are hesitant to get started, the best way to get your feet wet is by connecting with those you know personally: your friends, your family, your neighbors, and your co-workers. But don’t stop there; your network is likely a lot bigger than you might think. Don’t forget about your former co-workers, your connections through professional organizations to which you belong, your clients or customers, members of your mastermind communities and even vendors with whom you’ve done business.

2.) Add the New People You Meet. Not so long ago (unless you were in sales), the majority of the business cards you collected at networking events, conferences, trade shows and other professional development opportunities probably went in the garbage can the next day. Today, however, there’s no excuse for not taking a few minutes to extend the life of those connections by sending social media invites the very next day. Be sure to remind the person who you are by referencing something from your conversation or by providing a piece of follow-up information that you promised.

3.) Follow Your Followers’ Followers. Check out the connections and followers of your colleagues, peers, friends, etc. On Twitter you can easily see who your friends are following, as well as who is following them. Facebook automatically provides suggestions of people you might want to add because you have a lot of shared connections. In LinkedIn you may need to ask your connection to make an introduction. In any case, if you have things in common with your networkers, it stands to reason that a good portion of their followers is also worth following. Just be sure not to “spam” your connections’ lists.

4.) Broaden Your Reach. Extend invitations to people in groups to which you belong in both the real and online worlds, such as professional organizations. On LinkedIn you can connect with the folks you “meet” through LinkedIn Groups. And on Facebook you can make connections when you’re invited to attend an event or when you join someone’s fan page.

5.) Follow the Experts. We’re constantly learning from experts in our respective industries. Why not reach out to these folks in the social media world? Maybe you’ve just read a great book. See what the author has to say on Twitter. Or maybe you’ve received a brochure for an upcoming conference that you’d love to attend but can’t. Before you throw the brochure in the trash, search for the speakers’ names on social networking sites and send them an invitation to connect. Let them know that you saw their session description for the conference and you’re disappointed you’re going to miss it, but would love to keep track of where else they might be speaking.

6.) Do Some Digging. Don’t forget to take the time to search for people who share similar interests as you or who would be an ideal customer for your business. Granted, this is the most time consuming of all the methods. But it will give you the chance to unearth new sources, experts and connections that will add value to your business and who you might not otherwise have ever “met.” Be on the lookout for bloggers, reporters, and analysts who cover your company or industry.

7.) Invite Others to Follow You. To truly create an online “relationship” it needs to be a two-way street. One of the best ways to encourage others to follow you is by showing that you will provide value to them. In other words, be worthy of their time. Start by ensuring your online profiles are professional (i.e. no avatar photos) and accurately describe who you are, what you do, and what topics you’re interested in. Provide content and commentary that matches that profile, is timely and doesn’t spam. Promoting is fine so long as it’s balanced with valuable content. Link to blog posts, videos and articles your followers would find interesting. Ask questions
and provide insightful comments on other people’s posts. Make it easy for others to follow you by providing “follow me” widgets on your website, blog posts, article archives, and podcasts.

Remember: It’s not about the number of connections you have on these sites – it’s about the quality of those connections. Today, people are looking for authenticity. They want to meet real people with real things to say who will add value to their personal and professional lives.

Are you doing something interesting to find valuable contacts on social media sites? Leave us a comment and let us know what strategies are working for you!

By Angelique Rewers, ABC, APR – Richer. Smarter. Happier.
The Queen of Clarity – Angelique Rewers, ABC, APR, harnesses her extensive experience working with Fortune 500 companies to help solopreneurs clarify their marketing focus so they can build a business that makes them happy and makes them money. If you’re looking for simple, low-cost ways to boost your sales, get Free marketing *Brilliance!* now at: http://www.richersmarterhappier.com/brilliance_ezine.htm

Caching Tutorial

2010 May 28

I came across this tutorial which I found extremely useful
and my thanks and appreciation go to the author

Mark Nottingham <mnot@pobox.com>.
Copied from mnot.net

for Web Authors and Webmasters

This is an informational document. Although technical in nature, it attempts to make the concepts involved understandable and applicable in real-world situations. Because of this, some aspects of the material are simplified or omitted, for the sake of clarity. If you are interested in the minutia of the subject, please explore the References and Further Information at the end.

  1. What’s a Web Cache? Why do people use them?
  2. Kinds of Web Caches
    1. Browser Caches
    2. Proxy Caches
  3. Aren’t Web Caches bad for me? Why should I help them?
  4. How Web Caches Work
  5. How (and how not) to Control Caches
    1. HTML Meta Tags vs. HTTP Headers
    2. Pragma HTTP Headers (and why they don’t work)
    3. Controlling Freshness with the Expires HTTP Header
    4. Cache-Control HTTP Headers
    5. Validators and Validation
  6. Tips for Building a Cache-Aware Site
  7. Writing Cache-Aware Scripts
  8. Frequently Asked Questions
  9. Implementation Notes — Web Servers
  10. Implementation Notes — Server-Side Scripting
  11. References and Further Information
  12. About This Document

What’s a Web Cache? Why do people use them?

A Web cache sits between one or more Web servers (also known as origin servers) and a client or many clients, and watches requests come by, saving copies of the responses — like HTML pages, images and files (collectively known as representations) — for itself. Then, if there is another request for the same URL, it can use the response that it has,instead of asking the origin server for it again.

There are two main reasons that Web caches are used:

  • To reduce latency — Because the request is satisfied from the cache (which is closer to the client) instead of the origin server, it takes less time for it to get the representation and display it. This makes the Web seem more responsive.
  • To reduce network traffic — Because representations are reused, it reduces the amount of bandwidth used by a client. This saves money if the client is paying for traffic, and keeps their bandwidth requirements lower and more manageable.

Kinds of Web Caches

Browser Caches

If you examine the preferences dialog of any modern Web browser (like Internet Explorer, Safari or Mozilla), you’ll probably notice a “cache” setting. This lets you set aside a section of your computer’s hard disk to store representations that you’ve seen, just for you. The browser cache works according to fairly simple rules. It will check to make sure that the representations are fresh, usually once a session (that is, the once in the current invocation of the browser).

This cache is especially useful when users hit the “back” button or click a link to see a page they’ve just looked at. Also, if you use the same navigation images throughout your site, they’ll be served from browsers’ caches almost instantaneously.

Proxy Caches

Web proxy caches work on the same principle, but a much larger scale. Proxies serve hundreds or thousands of users in the same way; large corporations and ISPs often set them up on their firewalls, or as standalone devices (also known as intermediaries).

Because proxy caches aren’t part of the client or the origin server, but instead are out on the network, requests have to be routed to them somehow. One way to do this is to use your browser’s proxy setting to manually tell it
what proxy to use; another is using interception. Interception proxies have Web requests redirected to them by the underlying network itself, so that clients don’t need to be configured for them, or even know about them.

Proxy caches are a type of shared cache; rather than just having one person using them, they usually have a large number of users, and because of this they are very good at reducing latency and network traffic. That’s because popular representations are reused a number of times.

Gateway Caches

Also known as “reverse proxy caches” or “surrogate caches,” gateway caches are also intermediaries, but instead of being deployed by network administrators to save bandwidth, they’re typically deployed by Webmasters themselves, to make their sites more scalable, reliable and better performing.

Requests can be routed to gateway caches by a number of methods, but typically some form of load balancer is used to make one or more of them look like the origin server to clients.

Content delivery networks (CDNs) distribute gateway caches throughout the Internet (or a part of it) and sell caching to interested Web sites. Speedera and Akamai are examples of CDNs.

This tutorial focuses mostly on browser and proxy caches, although some of the information is suitable for those interested in gateway caches as well.

Aren’t Web Caches bad for me? Why should I help them?

Web caching is one of the most misunderstood technologies on the Internet. Webmasters in particular fear losing control of their site, because a proxy cache can “hide” their users from them, making it difficult to see who’s using the site.

Unfortunately for them, even if Web caches didn’t exist, there are too many variables on the Internet to assure that they’ll be able to get an accurate picture of how users see their site. If this is a big concern for you, this tutorial will teach you how to get the statistics you need without making your site cache-unfriendly.

Another concern is that caches can serve content that is out of date, or stale. However, this tutorial can show you how to configure your server to control how your content is cached.

CDNs
are an interesting development, because unlike many proxy caches, their gateway caches are aligned with the interests of the Web site being cached, so that these problems aren’t seen. However, even when you use a CDN, you still have to consider that there will be proxy and browser caches downstream.

On the other hand, if you plan your site well, caches can help your Web site load faster, and save load on your server and Internet link. The difference can be dramatic; a site that is difficult to cache may take several seconds to load, while one that takes advantage of caching can seem instantaneous in comparison. Users will appreciate a fast-loading site, and will visit more often.

Think of it this way; many large Internet companies are spending millions of dollars setting up farms of servers around the world to replicate their content, in order to make it as fast to access as possible for their users. Caches do the same for you, and they’re even closer to the end user. Best of all, you don’t have to pay for them.

The fact is that proxy and browser caches will be used whether you like it or not. If you don’t configure your site to be cached correctly, it will be cached using whatever defaults the cache’s administrator decides upon.

How Web Caches Work

All caches have a set of rules that they use to determine when to serve a representation from the cache, if it’s available. Some of these rules are set in the protocols (HTTP 1.0 and 1.1), and some are set by the administrator of the cache (either the user of the browser cache, or the proxy administrator).

Generally speaking, these are the most common rules that are followed (don’t worry if you don’t understand the details, it will be explained below):

  1. If the response’s headers tell the cache not to keep it, it won’t.
  2. If the request is authenticated or secure (i.e., HTTPS), it won’t be cached.
  3. A cached representation is considered fresh (that is, able to be sent to a client without checking with the origin server) if:
    • It has an expiry time or other age-controlling header set, and is still within the fresh period, or
    • If the cache has seen the representation recently, and it was modified relatively long ago.

    Fresh representations are served directly from the cache, without checking with the origin server.

  4. If an representation is stale, the origin server will be asked to validate it, or tell the cache whether the copy that it has is still good.
  5. Under certain circumstances — for example, when it’s disconnected from a network — a cache can serve stale responses without checking with the origin server.

If no validator (an ETag or Last-Modified header) is present on a response, and it doesn’t have any explicit freshness information, it will usually — but not always — be considered uncacheable.

Together, freshness and validation are the most important ways that a cache works with content. A fresh representation will be available instantly from the cache, while a validated representation will avoid sending the entire representation over again if it hasn’t changed.

How (and how not) to Control Caches

There are several tools that Web designers and Webmasters can use to fine-tune how caches will treat their sites. It may require getting your hands a little dirty with your server’s configuration, but the results are worth it.
For details on how to use these tools with your server, see the Implementation sections below.

HTML Meta Tags and HTTP Headers

HTML authors can put tags in a document’s <HEAD> section that describe its attributes. These meta tags are often used in the belief that they can mark a document as uncacheable, or expire it at a certain time.

Meta tags are easy to use, but aren’t very effective. That’s because they’re only honored by a few browser caches (which actually read the HTML), not proxy caches (which almost never read the HTML in the document). While it may be tempting to put a Pragma: no-cache meta tag into a Web page, it won’t necessarily cause it to be kept fresh.

If your site is hosted at an ISP or hosting farm and they don’t give you the ability to set arbitrary HTTP headers (like Expires and Cache-Control), complain loudly; these are tools necessary for doing your job.

On the other hand, true HTTP headers give you a lot of control over how both browser caches and proxies handle your representations. They can’t be seen in the HTML, and are usually automatically generated by the Web
server. However, you can control them to some degree, depending on the server you use. In the following sections, you’ll see what HTTP headers are interesting, and how to apply them to your site.

HTTP headers are sent by the server before the HTML, and only seen by the browser and any intermediate caches. Typical HTTP 1.1 response headers might look like this:

HTTP/1.1 200 OK
Date: Fri, 30 Oct 1998 13:19:41 GMT
Server: Apache/1.3.3 (Unix)
Cache-Control: max-age=3600, must-revalidate
Expires: Fri, 30 Oct 1998 14:19:41 GMT
Last-Modified: Mon, 29 Jun 1998 02:28:12 GMT
ETag: "3e86-410-3596fbbc"
Content-Length: 1040
Content-Type: text/html

The HTML would follow these headers, separated by a blank line. See the Implementation sections for information about how to set HTTP headers.

Pragma HTTP Headers (and why they don’t work)

Many people believe that assigning a Pragma: no-cache HTTP header to a representation will make it uncacheable. This is not necessarily true; the HTTP specification does not set any guidelines for Pragma response headers; instead, Pragma request headers (the headers that a browser sends to a server) are discussed. Although a few caches may honor this header, the majority won’t, and it won’t have any effect. Use the headers below instead.

Controlling Freshness with the Expires HTTP Header

The Expires HTTP header is a basic means of controlling caches; it tells all caches how long the associated representation is fresh for. After that time, caches will always check back with the origin server to see if a document is changed. Expires headers are supported by practically every cache.

Most Web servers allow you to set Expires response headers in a number of ways. Commonly, they will allow setting an absolute time to expire, a time based on the last time that the client retrieved the representation (last access time), or a time based on the last time the document changed on your server (last modification time).

Expires headers are especially good for making static images (like navigation bars and buttons) cacheable. Because they don’t change much, you can set extremely long expiry time on them, making your site appear much more responsive to your users. They’re also useful for controlling caching of a page that is regularly changed. For instance, if you update a news page once a day at 6am, you can set the representation to expire at that time, so caches will know when to get a fresh copy, without users having to hit ‘reload’.

The only value valid in an Expires header is a HTTP date; anything else will most likely be interpreted as ‘in the past’, so that the representation is uncacheable. Also, remember that the time in a HTTP date is Greenwich Mean Time (GMT), not local time.

For example:

Expires: Fri, 30 Oct 1998 14:19:41 GMT

It’s important to make sure that your Web server’s clock is accurate if you use the Expires header. One way to do this is using the Network Time Protocol (NTP); talk to your local system administrator to find out more.

Although the Expires header is useful, it has some limitations. First, because there’s a date involved, the clocks on the Web server and the cache must be synchronised; if they have a different idea of the time, the intended
results won’t be achieved, and caches might wrongly consider stale content as fresh.

Another problem with Expires is that it’s easy to forget that you’ve set some content to expire at a particular time. If you don’t update an Expires time before it passes, each and every request will go back to your Web server, increasing load and latency.

Cache-Control HTTP Headers

HTTP 1.1 introduced a new class of headers, Cache-Control response headers, to give Web publishers more control over their content, and to address the limitations of Expires.

Useful Cache-Control response headers include:

  • max-age=[seconds] — specifies the maximum amount of time that an representation will be considered fresh. Similar to Expires, this directive is relative to the time of the request, rather than absolute. [seconds] is the number of seconds from the time of the request you wish the representation to be fresh for.
  • s-maxage=[seconds] — similar to max-age, except that it only applies to shared (e.g., proxy) caches.
  • public — marks authenticated responses as cacheable; normally, if HTTP authentication is required, responses are automatically private.
  • private — allows caches that are specific to one user (e.g., in a browser) to store the response; shared caches (e.g., in a proxy) may not.
  • no-cache — forces caches to submit the request to the origin server for validation before releasing a cached copy, every time. This is useful to assure that authentication is respected (in combination with public), or to maintain rigid freshness, without sacrificing all of the benefits of caching.
  • no-store — instructs caches not to keep a copy of the representation under any conditions.
  • must-revalidate — tells caches that they must obey any freshness information you give them about a representation. HTTP allows caches to serve stale representations under special conditions; by specifying this header, you’re telling the cache that you want it to strictly follow your rules.
  • proxy-revalidate — similar to must-revalidate, except that it only applies to proxy caches.

For example:

Cache-Control: max-age=3600, must-revalidate

If you plan to use the Cache-Control headers, you should have a look at the excellent documentation in HTTP 1.1; see References and Further Information.

Validators and Validation

In How Web Caches Work, we said that validation is used by servers and caches to communicate when an representation has changed. By using it, caches avoid having to download the entire representation when they already have a copy locally, but they’re not sure if it’s still fresh.

Validators are very important; if one isn’t present, and there isn’t any freshness information (Expires or Cache-Control) available, caches will not store a representation at all.

The most common validator is the time that the document last changed, as communicated in Last-Modified header. When a cache has an representation stored that includes a Last-Modified header, it can use it to ask the server if the representation has changed since the last time it was seen, with an If-Modified-Since request.

HTTP 1.1 introduced a new kind of validator called the ETag. ETags are unique identifiers that are generated by the server and changed every time the representation does. Because the server controls how the ETag is generated, caches can be surer that if the ETag matches when they make a If-None-Match request, the representation really is the same.

Almost all caches use Last-Modified times in determining if an representation is fresh; ETag validation is also becoming prevalent.

Most modern Web servers will generate both ETag and Last-Modified headers to use as validators for static content (i.e., files) automatically; you won’t have to do anything. However, they don’t know enough about dynamic content (like CGI, ASP or database sites) to generate them; see Writing Cache-Aware Scripts.

Tips for Building a Cache-Aware Site

Besides using freshness information and validation, there are a number of other things you can do to make your site more cache-friendly.

  • Use URLs consistently — this is the golden rule of caching. If you serve the same content on different pages, to different users, or from different sites, it should use the same URL.
    This is the easiest and most effective way to make your site cache-friendly. For example, if you use “/index.html” in your HTML as a reference once, always use it that way.
  • Use a common library of images and other elements and refer back to them from different places.
  • Make caches store images and pages that don’t change often by using a Cache-Control: max-age header with a large value.
  • Make caches recognise regularly updated pages by specifying an appropriate max-age or expiration time.
  • If a resource (especially a downloadable file) changes, change its name. That way, you can make it expire far in the future, and still guarantee that the correct version is served; the page that links to it is the only one that will need a short expiry time.
  • Don’t change files unnecessarily. If you do, everything will have a falsely young Last-Modified date. For instance, when updating your site, don’t copy over the entire site; just move the files that you’ve changed.
  • Use cookies only where necessary — cookies are difficult to cache, and aren’t needed in most situations. If you must use a cookie, limit its use to dynamic pages.
  • Minimize use of SSL — because encrypted pages are not stored by shared caches, use them only when you have to, and use images on SSL pages sparingly.
  • Check your pages with REDbot — it can help you apply many of the concepts in this tutorial.

Writing Cache-Aware Scripts

By default, most scripts won’t return a validator (a Last-Modified or ETag response header) or freshness information (Expires or Cache-Control). While some scripts really are dynamic (meaning that they return a different response for every request), many (like search engines and database-driven sites) can benefit from being cache-friendly.

Generally speaking, if a script produces output that is reproducible with the same request at a later time (whether it be minutes or days later), it should be cacheable. If the content of the script changes only depending on what’s in the URL, it is cacheble; if the output depends on a cookie, authentication information or other external criteria, it probably isn’t.

  • The best way to make a script cache-friendly (as well as perform better) is to dump its content to a plain file whenever it changes. The Web server can then treat it like any other Web page, generating and using validators, which makes your life easier. Remember to only write files that have changed, so the Last-Modified times are preserved.
  • Another way to make a script cacheable in a limited fashion is to set an age-related header for as far in the future as practical. Although this can be done with Expires, it’s probably easiest to do so with Cache-Control: max-age, which will make the request fresh for an amount of time after the request.
  • If you can’t do that, you’ll need to make the script generate a validator, and then respond to If-Modified-Since and/or If-None-Match requests. This can be done by parsing the HTTP headers, and then responding with 304 Not Modified when appropriate. Unfortunately, this is not a trival task.

Some other tips;

  • Don’t use POST unless it’s appropriate. Responses to the POST method aren’t kept by most caches; if you send information in the path or query (via GET), caches can store that information for the future.
  • Don’t embed user-specific information in the URL unless the content generated is completely unique to that user.
  • Don’t count on all requests from a user coming from the same host, because caches often work together.
  • Generate Content-Length response headers. It’s easy to do, and it will allow the response of your script to be used in a persistent connection. This allows clients to request multiple representations on one TCP/IP connection, instead of setting up a connection for every request. It makes your site seem much faster.

See the Implementation Notes for more specific
information.

Frequently Asked Questions

What are the most important things to make cacheable?

A good strategy is to identify the most popular, largest representations
(especially images) and work with them first.

How can I make my pages as fast as possible with caches?

The most cacheable representation is one with a long freshness time set.
Validation does help reduce the time that it takes to see a representation,
but the cache still has to contact the origin server to see if it’s fresh. If
the cache already knows it’s fresh, it will be served directly.

I understand that caching is good, but I need to keep statistics on how
many people visit my page!

If you must know every time a page is accessed, select ONE small item on
a page (or the page itself), and make it uncacheable, by giving it a suitable
headers. For example, you could refer to a 1×1 transparent uncacheable image
from each page. The Referer header will contain information about what page
called it.

Be aware that even this will not give truly accurate statistics about your
users, and is unfriendly to the Internet and your users; it generates
unnecessary traffic, and forces people to wait for that uncached item to be
downloaded. For more information about this, see On Interpreting Access
Statistics in the references.

How can I see a representation’s HTTP headers?

Many Web browsers let you see the Expires and Last-Modified headers are in
a “page info” or similar interface. If available, this will give you a menu of
the page and any representations (like images) associated with it, along with
their details.

To see the full headers of a representation, you can manually connect to
the Web server using a Telnet client.

To do so, you may need to type the port (be default, 80) into a separate
field, or you may need to connect to www.example.com:80 or www.example.com 80
(note the space). Consult your Telnet client’s documentation.

Once you’ve opened a connection to the site, type a request for the
representation. For instance, if you want to see the headers for
http://www.example.com/foo.html, connect to www.example.com, port 80, and
type:

GET /foo.html HTTP/1.1 [return]
Host: www.example.com [return][return]

Press the Return key every time you see [return]; make sure to press it
twice at the end. This will print the headers, and then the full
representation. To see the headers only, substitute HEAD for GET.

My pages are password-protected; how do proxy caches deal with them?

By default, pages protected with HTTP authentication are considered private;
they will not be kept by shared caches. However, you can make authenticated
pages public with a Cache-Control: public header; HTTP 1.1-compliant caches will then
allow them to be cached.

If you’d like such pages to be cacheable, but still authenticated for every
user, combine the Cache-Control: public and no-cache headers. This tells the
cache that it must submit the new client’s authentication information to the
origin server before releasing the representation from the cache. This would look like:

Cache-Control: public, no-cache

Whether or not this is done, it’s best to minimize use of authentication;
for example, if your images are not sensitive, put them in a separate
directory and configure your server not to force authentication for it. That
way, those images will be naturally cacheable.

Should I worry about security if people access my site through a
cache?

SSL pages are not cached (or decrypted) by proxy caches, so you don’t have
to worry about that. However, because caches store non-SSL requests and URLs
fetched through them, you should be conscious about unsecured sites; an
unscrupulous administrator could conceivably gather information about their
users, especially in the URL.

In fact, any administrator on the network between your server and your
clients could gather this type of information. One particular problem is when
CGI scripts put usernames and passwords in the URL itself; this makes it
trivial for others to find and user their login.

If you’re aware of the issues surrounding Web security in general, you
shouldn’t have any surprises from proxy caches.

I’m looking for an integrated Web publishing solution. Which ones are
cache-aware?

It varies. Generally speaking, the more complex a solution is, the more
difficult it is to cache. The worst are ones which dynamically generate all
content and don’t provide validators; they may not be cacheable at all. Speak
with your vendor’s technical staff for more information, and see the
Implementation notes below.

My images expire a month from now, but I need to change them in the
caches now!

The Expires header can’t be circumvented; unless the cache (either browser
or proxy) runs out of room and has to delete the representations, the cached
copy will be used until then.

The most effective solution is to change any links to them; that way,
completely new representations will be loaded fresh from the origin server.
Remember that the page that refers to an representation will be cached as
well. Because of this, it’s best to make static images and similar
representations very cacheable, while keeping the HTML pages that refer to
them on a tight leash.

If you want to reload an representation from a specific cache, you can
either force a reload (in Firefox, holding down shift while pressing ‘reload’
will do this by issuing a Pragma: no-cache request header) while using the
cache. Or, you can have the cache administrator delete the representation
through their interface.

I run a Web Hosting service. How can I let my users publish
cache-friendly pages?

If you’re using Apache, consider allowing them to use .htaccess files and
providing appropriate documentation.

Otherwise, you can establish predetermined areas for various caching
attributes in each virtual server. For instance, you could specify a
directory /cache-1m that will be cached for one month after access, and a
/no-cache area that will be served with headers instructing caches not to
store representations from it.

Whatever you are able to do, it is best to work with your largest
customers first on caching. Most of the savings (in bandwidth and in load on
your servers) will be realized from high-volume sites.

I’ve marked my pages as cacheable, but my browser keeps requesting them
on every request. How do I force the cache to keep representations of them?

Caches aren’t required to keep a representation and reuse it; they’re only
required to not keep or use them under some conditions. All
caches make decisions about which representations to keep based upon their
size, type (e.g., image vs. html), or by how much space they have left to keep
local copies. Yours may not be considered worth keeping around, compared to
more popular or larger representations.

Some caches do allow their administrators to prioritize what kinds of
representations are kept, and some allow representations to be “pinned” in
cache, so that they’re always available.

Implementation Notes — Web
Servers

Generally speaking, it’s best to use the latest version of whatever Web
server you’ve chosen to deploy. Not only will they likely contain more
cache-friendly features, new versions also usually have important security
and performance improvements.

Apache HTTP Server

Apache uses
optional modules to include headers, including both Expires and
Cache-Control. Both modules are available in the 1.2 or greater
distribution.

The modules need to be built into Apache; although they are included in
the distribution, they are not turned on by default. To find out if the
modules are enabled in your server, find the httpd binary and run httpd
-l
; this should print a list of the available modules (note that this only
lists compiled-in modules; on later versions of Apache, use httpd -M
to include dynamically loaded modules as well). The modules we’re
looking for are mod_expires and mod_headers.

  • If they aren’t available, and you have administrative access, you can
    recompile Apache to include them. This can be done either by uncommenting
    the appropriate lines in the Configuration file, or using the
    -enable-module=expires and -enable-module=headers

    arguments to configure (1.3 or greater). Consult the INSTALL file found
    with the Apache distribution.

Once you have an Apache with the appropriate modules, you can use
mod_expires to specify when representations should expire, either in .htaccess
files or in the server’s access.conf file. You can specify expiry from either
access or modification time, and apply it to a file type or as a default. See
the module
documentation
for more information, and speak with your local Apache guru
if you have trouble.

To apply Cache-Control headers, you’ll need to use the mod_headers module,
which allows you to specify arbitrary HTTP headers for a resource. See the
mod_headers documentation
.

Here’s an example .htaccess file that demonstrates the use of some
headers.

  • .htaccess files allow web publishers to use commands normally only
    found in configuration files. They affect the content of the directory
    they’re in and their subdirectories. Talk to your server administrator to
    find out if they’re enabled.
### activate mod_expires
ExpiresActive On
### Expire .gif's 1 month from when they're accessed
ExpiresByType image/gif A2592000
### Expire everything else 1 day from when it's last modified
### (this uses the Alternative syntax)
ExpiresDefault "modification plus 1 day"
### Apply a Cache-Control header to index.html
<Files index.html>
Header append Cache-Control "public, must-revalidate"
</Files>
  • Note that mod_expires automatically calculates and inserts a
    Cache-Control:max-age header as appropriate.

Apache 2’s configuration is very similar to that of 1.3; see the 2.2 mod_expires and
mod_headers
documentation for more information.

Microsoft IIS

Microsoft’s
Internet Information Server makes it very easy to set headers in a somewhat
flexible way. Note that this is only possible in version 4 of the server,
which will run only on NT Server.

To specify headers for an area of a site, select it in the
Administration Tools interface, and bring up its properties. After
selecting the HTTP Headers tab, you should see two interesting
areas; Enable Content Expiration and Custom HTTP headers.
The first should be self-explanatory, and the second can be used to apply
Cache-Control headers.

See the ASP section below for information about setting headers in Active
Server Pages. It is also possible to set headers from ISAPI modules; refer to
MSDN for details.

Netscape/iPlanet Enterprise Server

As of version 3.6, Enterprise Server does not provide any obvious way to
set Expires headers. However, it has supported HTTP 1.1 features since version
3.0. This means that HTTP 1.1 caches (proxy and browser) will be able to take
advantage of Cache-Control settings you make.

To use Cache-Control headers, choose Content Management | Cache Control
Directives
in the administration server. Then, using the Resource Picker,
choose the directory where you want to set the headers. After setting the
headers, click ‘OK’. For more information, see the NES manual.

Implementation Notes — Server-Side
Scripting

One thing to keep in mind is that it may be easier to set
HTTP headers with your Web server rather than in the scripting language. Try
both.

Because the emphasis in server-side scripting is on dynamic content, it
doesn’t make for very cacheable pages, even when the content could be cached.
If your content changes often, but not on every page hit, consider setting a
Cache-Control: max-age header; most users access pages again in a relatively
short period of time. For instance, when users hit the ‘back’ button, if there
isn’t any validator or freshness information available, they’ll have to wait
until the page is re-downloaded from the server to see it.

CGI

CGI scripts are one of the most popular ways to generate content. You can
easily append HTTP response headers by adding them before you send the body;
Most CGI implementations already require you to do this for the
Content-Type header. For instance, in Perl;

#!/usr/bin/perl
print "Content-type: text/html\n";
print "Expires: Thu, 29 Oct 1998 17:04:19 GMT\n";
print "\n";
### the content body follows...

Since it’s all text, you can easily generate Expires and other
date-related headers with in-built functions. It’s even easier if you use

Cache-Control: max-age;

print "Cache-Control: max-age=600\n";

This will make the script cacheable for 10 minutes after the request, so
that if the user hits the ‘back’ button, they won’t be resubmitting the
request.

The CGI specification also makes request headers that the client sends
available in the environment of the script; each header has ‘HTTP_’ prepended
to its name. So, if a client makes an If-Modified-Since request, it will show
up as HTTP_IF_MODIFIED_SINCE.

See also the cgi_buffer
library, which automatically handles ETag generation and validation,
Content-Length generation and gzip content-coding for Perl and Python CGI
scripts with a one-line include. The Python version can also be used to wrap
arbitrary CGI scripts with.

Server Side Includes

SSI (often used with the extension .shtml) is one of the first ways that
Web publishers were able to get dynamic content into pages. By using special
tags in the pages, a limited form of in-HTML scripting was available.

Most implementations of SSI do not set validators, and as such are not
cacheable. However, Apache’s implementation does allow users to specify which
SSI files can be cached, by setting the group execute permissions on the
appropriate files, combined with the XbitHack full directive. For more
information, see the mod_include
documentation
.

PHP

PHP is a
server-side scripting language that, when built into the server, can be used
to embed scripts inside a page’s HTML, much like SSI, but with a far larger
number of options. PHP can be used as a CGI script on any Web server (Unix or
Windows), or as an Apache module.

By default, representations processed by PHP are not assigned validators,
and are therefore uncacheable. However, developers can set HTTP headers by
using the Header() function.

For example, this will create a Cache-Control header, as well as an
Expires header three days in the future:

<?php
 Header("Cache-Control: must-revalidate");

 $offset = 60 * 60 * 24 * 3;
 $ExpStr = "Expires: " . gmdate("D, d M Y H:i:s", time() + $offset) . " GMT";
 Header($ExpStr);
?>

Remember that the Header() function MUST come before any other output.

As you can see, you’ll have to create the HTTP date for an Expires header
by hand; PHP doesn’t provide a function to do it for you (although recent
versions have made it easier; see the PHP’s date documentation). Of course, it’s
easy to set a Cache-Control: max-age header, which is just as good for most
situations.

For more information, see the manual entry for
header
.

See also the cgi_buffer library, which
automatically handles ETag generation and validation, Content-Length
generation and gzip content-coding for PHP scripts with a one-line
include.

Cold Fusion

Cold Fusion, by Macromedia is a commercial server-side
scripting engine, with support for several Web servers on Windows, Linux and
several flavors of Unix.

Cold Fusion makes setting arbitrary HTTP headers relatively easy, with the
CFHEADER
tag. Unfortunately, their example for setting an Expires header, as below, is a bit misleading.

<CFHEADER NAME="Expires" VALUE="#Now()#">

It doesn’t work like you might think, because the time (in this case, when the request is made)
doesn’t get converted to a HTTP-valid date; instead, it just gets printed as
a representation of Cold Fusion’s Date/Time object. Most clients will either
ignore such a value, or convert it to a default, like January 1, 1970.

However, Cold Fusion does provide a date formatting function that will do the job;
GetHttpTimeString. In combination with
DateAdd
, it’s easy to set Expires dates;
here, we set a header to declare that representations of the page expire in one month;

<cfheader name="Expires"
  value="#GetHttpTimeString(DateAdd('m', 1, Now()))#">

You can also use the CFHEADER tag to set Cache-Control: max-age and other headers.

Remember that Web server headers are passed through in some deployments of Cold Fusion
(such as CGI); check yours to determine whether you can use
this to your advantage, by setting headers on the server instead of in Cold
Fusion.

ASP and ASP.NET

When setting HTTP headers from ASPs, make sure you either
place the Response method calls before any HTML generation, or use
Response.Buffer to buffer the output. Also, note that some versions of IIS set
a Cache-Control: private header on ASPs by default, and must be declared public
to be cacheable by shared caches.

Active Server Pages, built into IIS and also available for other Web
servers, also allows you to set HTTP headers. For instance, to set an expiry
time, you can use the properties of the Response object;

<% Response.Expires=1440 %>

specifying the number of minutes from the request to expire the
representation. Cache-Control headers can be added like this:

<% Response.CacheControl="public" %>

In ASP.NET, Response.Expires is deprecated; the proper way to set cache-related
headers is with Response.Cache;

Response.Cache.SetExpires ( DateTime.Now.AddMinutes ( 60 ) ) ;
Response.Cache.SetCacheability ( HttpCacheability.Public ) ;

See the MSDN documentation for more
information.

References and Further Information

HTTP 1.1 Specification

The HTTP 1.1 spec has many extensions for making pages cacheable,
and is the authoritative guide to implementing the protocol. See sections 13,
14.9, 14.21, and 14.25.

Web-Caching.com

An excellent introduction to caching concepts, with links to other online
resources.

On Interpreting
Access Statistics

Jeff Goldberg’s informative rant on why you shouldn’t rely on access
statistics and hit counters.

REDbot

Examines HTTP resources to determine how they will interact with Web caches, and generally how well they use the protocol.

cgi_buffer Library

One-line include in Perl CGI, Python CGI and PHP scripts automatically
handles ETag generation and validation, Content-Length generation and gzip
Content-Encoding — correctly. The Python version can also be used as a
wrapper around arbitrary CGI scripts.

About This Document

This document is Copyright © 1998-2010 Mark Nottingham <mnot@pobox.com>.

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported License.

All trademarks within are property of their respective holders.

Although the author believes the contents to be accurate at the time of
publication, no liability is assumed for them, their application or any
consequences thereof. If any misrepresentations, errors or other need for
clarification is found, please contact the author immediately.

The latest revision of this document can always be obtained from http://www.mnot.net/cache_docs/

Translations are available in:
Chinese,
Czech,
German, and
French.

April 9, 2010

Creative Commons Licence

Social Bookmarking Strategies

2009 December 11

This is a good video about the SEO benefits of Social Bookmarking – a great way of gaining traffic.

Thanks to http://videos.sitepronews.com/

Google Page Speed and Yslow 2.0

2009 December 9

Posted By Mike Hopley On June 15, 2009 @ 12:52 pm

Hot on the heels of Yahoo’s Yslow [1], Google have published Page Speed [2], a tool they have been using to optimize their own web pages. Now you can use it too.

Page Speed is similar to Yslow in several respects: it’s an add-on to Firebug; it analyses your pages according to a set of performance rules; it draws attention to rules that you score badly on; and it also provides a page activity monitor.
Google Page Speed
For each rule, Page Speed gives you a general indication of how well you’re doing, in the form of a green tick (good), red circle (bad), or amber triangle (indifferent). You can also hover over a rule to see your percentage score. Page Speed does not provide an overall percentage score, but it does arrange the results in order of importance.

Since my previous article about Yslow 1.0 [3], Yslow 2.0 has been released; this includes a further 9 rules. Let’s now take a look at all these rules—both Page Speed and Yslow.

Page Speed rules shared with Yslow 1.0

These rules are the same as those I previously discussed, although their organisation is different. I’m not going to cover these rules again, but you may still find some of Page Speed’s documentation [4] details interesting.

  • Avoid CSS expressions
  • Combine external CSS
  • Combine external javascript
  • Enable gzip compression
  • Leverage browser caching
  • Minify javascript
  • Minimize DNS lookups
  • Minimize redirects
  • Parallelize downloads across multiple hostnames
  • Put CSS in the document head

Page Speed may sometimes advise you to do stupid things. For example, under the “Leverage browser caching” rule, Page Speed suggested that I make __utm.gif cacheable. This tiny gif is used by Google Analytics to help compile your statistics; if you make it cacheable, Analytics will fail to track visitors who retrieved it from a cache. Leave it alone!

Page Speed rules shared with Yslow 2.0

I haven’t yet discussed these rules, so let’s look at them now:

  • Minimize cookie size
  • Serve static content from a cookie-less domain
  • Specify image dimensions

Minimize cookie size

Google’s advice here is more specific than Yslow’s: keep the average cookie size below 400 bytes. I get perfect marks on this from both tools, so I haven’t investigated further.

Serve static content from a cookie-less domain

I score badly on this one, and I expect many other sites will too. Surprisingly, requests for components such as images also include cookies, and these are generally just useless network traffic. The best way to fix this is to use a CDN; obviously, this will also cause you to score well on Yslow’s “Use a CDN” rule.

And yes, this means I’ve changed my mind about CDN’s. Some good, cheap CDN’s are now available, such as SimpleCDN or Amazon Cloudfront. I use SimpleCDN, and have found them to be okay; but I’m not happy with their changing their service offering at short notice, and their Lightning service is currently not working for me—hence my poor score on this rule!

Specify image dimensions

Lazy designers often take a large image and use HTML to squash it down; consequently, the file size can be much larger than is necessary.

Don’t use HTML to resize images; use a graphics program. The width and height attributes in your HTML <img> tag should exactly match the size of the image. Doing so will also give the best appearance, as the browser does not need to scale the image.

I get a perfect score on this, and there’s no excuse for anything less.

Rules unique to Page Speed

In some cases, Yslow’s guidelines may include these topics, but not in the form of an automated check on your web page.

  • Defer loading of javascript
  • Optimize images
  • Optimize the order of styles and scripts
  • Remove unused CSS
  • Serve resources from a consistent URL
  • Leverage proxy caching
  • Use efficient CSS selectors

Defer loading of javascript

Before you can test your website against this rule, you must enable “Profile Deferrable Javascript” in the options. You may want to disable it again when you’re done, as it can slow down Firefox. Moreover, this profiling is only accurate on your first visit: to get an accurate result, you must start a new browser session and run Page Speed directly you load your website (before loading a second page).

The idea behind this rule is that javascript slows down your pages even when it’s not actually being used. Even if the script is cached, the browser must load it from disk and execute it. Some javascript functions need to be available before the onLoad event; others don’t. This rule proposes that you split off these latter functions into a separate file. You can then use some trickery to “lazy-load” this javascript after the document has finished loading.

It’s unclear to me whether this lazy-loading is better than simply putting your script at the bottom of the page. If your script is at the bottom, then it will still need to be downloaded and evaluated before the onLoad event is fired (I think), and lazy-loading will bypass this limitation; but what if you’re using a framework such as jQuery, which has the more sophisticated onDomReady event? To be honest, I don’t yet know enough about this issue to make simple recommendations. I suspect, however, that lazy-loading is even faster than simply putting javascript at the bottom of the page.

The good news is that Google Page Speed will identify these uncalled functions for you. Ironically, although I had plenty of uncalled functions, they all came from Google’s own ga.js; I’m not sure I want to mess with that, as it may screw up my Google Analytics stats.

Optimise images

Page Speed automatically creates optimised versions of your images, and offers a link to them. This is similar to running your images through Smush.it.

Optimise the order of styles and scripts

Ideally, you wouldn’t include any javascript in the <head>, as this violates Yslow’s rule, “Put javascript at the bottom.” If you need to include scripts in the <head>, however, try to get the order right. External scripts should come after all the external stylesheets, and inline scripts should come last of all.

Why does the order matter? Because javascript blocks subsequent downloads. While your javascript is downloading and being evaluated, the stylesheet that comes after it can’t be downloaded. Check out the documentation for more details.

Remove unused CSS

I score 100% on this one, which is surprising given that my CSS is an overgrown, tangled thicket of complexity that desperately needs pruning.

Every CSS rule adds bytes to be downloaded, and also requires parsing by the browser. Obviously it’s good to remove dusty old rules that you never use, but even after doing so you may still have a single monolithic CSS file that styles a diverse range of pages. As a result, every page gets a large amount of CSS that’s not needed; if your site is like this, it may be more efficient to split your CSS across multiple modules (although this increases HTTP requests).

Clearly a trade-off is necessary here. I recommend keeping a consistent style as much as possible; apart from the speed benefits, consistency helps visitors and generally looks more professional than constantly changing styles. For large sites with many different types of pages (such as Yahoo), however, it’s often better to split CSS into modules.

Even if your site only needs one stylesheet, it’s a good idea to start thinking in terms of object-oriented CSS, because this makes your CSS simpler, shorter, and more flexible; for an expert explanation of the topic, see Nicole Sullivan’s presentation [5].

Serve resources from a consistent URL

This one is fairly obvious. To benefit from caching, we need to keep the URL consistent. For example, if on different pages you serve the same image, but from two different domains, then it will get downloaded twice instead of being read from cache.

For most people this shouldn’t be issue; it’s most likely to apply if you’re doing something fancy and automated to split your content across multiple hostnames.

Leverage proxy caching

Now this one is clever. When people visit your website, its resources can be cached not only by them but also by their ISP. Then when another visitor comes via the same ISP, he can download a copy from the ISP’s cache—which will be faster, because it’s closer to him than your server. Page Speed’s documentation recommends that, with a few exceptions, you set a Cache-control: public header for static resources (such as images).

Be careful not to do this for resources that have cookies attached, as you may end up allowing proxies to cache information that should be kept private to a visitor; the best solution is to serve these resources from a cookie-less domain. Also be careful with gzipping: some proxies will send gzipped content to browsers that can’t read it.

I’m not sure how this rule interacts with the use of CDN. Again, this is one I don’t understand well, and I’d welcome discussion on it.

Use efficient CSS selectors

This rule is controversial. The idea is that some CSS selectors are much harder for the browser to parse than others; the most efficient are ID and Class selectors, because these do not require the browser to look higher up the document tree to determine a match.

With this rule, Google is recommending a radical change in the way we write CSS. Specifically, they are suggesting that we add otherwise unnecessary ID’s and classes to the markup, in return for a speed advantage. As an example, they consider the situation where you want different colours on ordered and unordered list items:

ul li {color: blue;}
ol li {color: red;}

That would be the usual way to do it; instead, Google recommends adding a class to each <li>, so that you can use class selectors:

.unordered-list-item {color: blue;}
.ordered-list-item {color: red;}

No doubt this is faster, but it also takes longer to write and imposes a maintainance burden in your markup. If there were tools that would automatically generate such optimised CSS and the accompanying markup, then it might be worth doing. I suppose you could use server-side coding to generate the markup—for example, using the HTML helper from CakePHP—but this seems a heavy-handed approach.

My scepticism over this rule was initially quashed by the towering authority of Google, but then I looked around to see whether there was any research on the subject. The most respectable tests I could find came from Steve Souders himself, in his post about the performance impact of CSS selectors [6]. Steve found that, in real-world conditions, the maximum possible benefit of optimising CSS is 50 ms; and for 70% of users (i.e. those running IE7 or FF3), it’s only 20 ms. These numbers were obtained with 6000 DOM elements and 2000 extremely inefficient CSS rules. This is pretty much a worst-case scenario; most sites, even complex ones, will have far fewer DOM elements and CSS rules, and their CSS will also be much simpler.

Steve concludes that the potential performance benefits are small, and not worth the cost. I’m inclined to agree, but I’d welcome more information.

Nevertheless, there’s no harm in getting into good habits: some of Google’s recommendations for CSS selectors are quite reasonable, such as not over-qualifying ID and class selectors with an antecedent tag selector (so .errorMessage is better than p.errorMessage). Such coding habits also sit harmoniously with object-oriented CSS.

If you read Steve’s post, be sure to check out Nicole Sullivan’s comment: “Micro-optimization of selectors is going a bit off track in a performance analysis of CSS. The focus should be on scalable CSS.” To me, this seems a much more sensible and maintainable approach than the monomaniacal one recommended by Google.

I do extremely badly on this rule (0%). Although I consider the recommendations to be unrealistic, my terrible score does reflect the excessive complexity and lack of modularity within my CSS.

Rules unique to Yslow 2.0

In some cases, Page Speed’s guidelines may include these topics, but not in the form of an automated check on your web page.

Note that Yahoo’s documentation includes other recommendations that are not checked by Yslow (because they haven’t discovered a sensible way to automate the test). Yslow has 22 rules, but Yahoo lists 34 best practices [7] in total.

  • Reduce the number of DOM elements
  • Make favicon small and cacheable
  • Avoid HTTP 404 (Not Found) errors
  • Avoid AlphaImageLoader filter
  • Make AJAX cacheable
  • Use GET for AJAX requests

Reduce the number of DOM elements

The more DOM elements you have, the longer it takes to download your page, to render it, and to play with the DOM via javascript.

Essentially, this rule asks you to avoid large amounts of unnecessary markup, including markup added by javascript. As an example, yahoo.com has about 700 DOM nodes, despite being a busy page. My home page has 267 DOM nodes, and that could be reduced a lot. You can check how many nodes your page has, by entering the following into Firebug’s console:

document.getElementsByTagName('*').length

Blindly applying this rule can be dangerous (and that’s true of many performance rules). Don’t cut off your nose to spite your face! Markup purists will take this rule as a vindication for using the absolute minimum of markup, and in particular for avoiding the use of container <div>s whenever possible. This will leave them with hideously convoluted CSS and problems maintaining their code.

By all means remove extraneous markup, and also try to limit the complexity of DOM access in your javascript (for example, avoid using javascript to fix layout issues [8]—here I have sinned). But never be afraid to throw in an extra container <div> when you can see it will make life easier.

Make favicon.ico small and cacheable

You might think that a favicon is not even worth the HTTP request, but you don’t get a say in the matter: the browser is going to request it anyway. Make one, make it small, and put it in the root directory of your website (where the browser will look for it).

Because you can’t change the name of this file—it must be called favicon.ico, or it won’t work—you should be moderate in setting its expiry date. It’s hardly essential that your visitors immediately get your latest favicon, but equally you wouldn’t want it to be cached for 10 years! I give mine a two-month shelf-life.

Avoid HTTP 404 (Not Found) errors

This one is obvious. If your document has broken links, fix them.

Avoid AlphaImageLoader filter

Ah, good old alpha-transparent PNG’s; how we love them! What web designer hasn’t flirted with multi-layer scrolling transparencies at some point? And who has not felt a sense of satisfied mastery, upon forcing IE6 to eat them via a clever hack?

The sobering reality is that, although you can make alpha-transparency work in IE6, you pay a heavy price for doing so. All the hacks rely on Microsoft’s AlphaImageLoader filter. Not only does this filter block rendering and freeze the browser while it’s being calculated, but it also increases memory consumption. To make matters worse, the performance penalty is incurred for each element, not for each image. For example, let’s say you have a fancy alpha-transparent bullet point image for your unordered list items; on a page with 20 bullets, you get the penalty 20 times over.

Use PNG-8 transparency instead, if you can. Incidentally, creating a web page from multiple layers of transparency is probably a bad idea anyway: even in good browsers, these kinds of pages are sluggish to scroll; find a better medium for expressing latent op art.

Make AJAX cacheable, and use GET for AJAX requests

I can’t pretend to understand these rules properly, having never used AJAX. Nevertheless, the ideas are straightforward.

If a resource has not changed since it was fetched, we want to read it from cache rather than getting a new copy; this applies just as much to something requested via AJAX. Steve summarises it thus:

Even though your Ajax responses are created dynamically, and might only be applicable to a single user, they can still be cached. Doing so will make your Web 2.0 apps faster.

Apparently, GET is more efficient than POST, because it can send data in a single packet, whereas POST takes two steps: first send the headers, then send the data. Providing your data is less than 2 kB (a limit on the URL length in IE), you should be able to use GET for AJAX requests.

Conclusions

Google Page Speed is a useful new tool for optimising your websites’ performance. However, some of its advice can be misleading—in particular, CSS selector efficiency is a red herring that distracts you from the more useful goal of building object-oriented CSS.

Yslow is the more mature tool, and I recommend you give it priority. After you’ve finished with Yslow, you may be interested in what Page Speed has to say.


Article printed from The Web Squeeze: http://www.thewebsqueeze.com

URL to article: http://www.thewebsqueeze.com/web-design-articles/google-page-speed-and-yslow-2-0.html

URLs in this post:

[1] Yslow: http://developer.yahoo.com/yslow/

[2] Page Speed: http://code.google.com/speed/page-speed/

[3] article about Yslow 1.0: http://www.thewebsqueeze.com/web-design-articles/yslow-going-from-f-to-a.html

[4] Page Speed’s documentation: http://code.google.com/speed/page-speed/docs/using.html

[5] see Nicole Sullivan’s presentation: http://developer.yahoo.net/blogs/theater/archives/2009/03/website_and_webapp_performance.html

[6] performance impact of CSS selectors: http://www.stevesouders.com/blog/2009/03/10/performance-impact-of-css-selectors/

[7] 34 best practices: http://developer.yahoo.com/performance/rules.html

[8] avoid using javascript to fix layout issues: http://developer.yahoo.com/performance/rules.html#dom_access

301 redirect to the “www” site

2009 September 6

Fred Morgan – seoexploration.com

With the modern style of losing the “www” at the beginning of the website URL you may encounter problems – especially with statistical analysis programs.

I’m not really concerned with the semantics of whether one should or should not use the starting “www” but being of the old school I still tend towards using it.  These days most hosting facilities utilise the dual base directories of “public_html” and “www” where “www”  is the automatic clone of the “public_html” directory so, under normal circumstances, either URL will work fine.
However, I have seen browser cache problems give display problems identifying a difference between them  – i.e. missing favicon images and intermittent flash errors etc.  It took me a long time before I identified where the problem actually lay. Everyone always identifies the standard cache reload fix but not that you may be varying the actual URL (www) search which can happen when using previous saved browser shortcuts or bookmarks.

.htaccess

I always now use a 301 redirect with the following script in my .htaccess file which redirects my incoming URLs to www however this step should only be taken by those who understand that problems could follow if incorrect editing is done. Firstly create a backup copy of your original .htaccess file just in case things get messed up. Now add the following  3 lines of 301 redirection code
(note :
aaaaaaaa.bbb is tobe replaced by your website URL i.e.   okikoki.com)


RewriteEngine On

RewriteCond %{HTTP_HOST} ^aaaaaaaa.bbb
RewriteRule (.*) http://www.
aaaaaaaa.bbb/$1 [R=301,L]

Track SEO rankings and Sitelinks with Google Analytics II

2009 September 5
by anon

2 September, 2009 André Scholten

Earlier this year I did a guest post on this site to show you how to track your SEO rankings with Google Analytics. It was quite some news for a lot of people, just take a look at the 300+ comments. And now it’s time for the follow-up.

Google’s new technology

Since a while Google is testing a new AJAX version of their search engine. I’m not sure who’s seeing the AJAX version and who isn’t, but in Holland most of the Firefox users do see it. You can see if you’re one the new one by looking at the url of a result page:

examplefirefox

The great thing about this new version is that it makes Google Analytics capable of tracking the clicked position. Yes you heard what I say: the position. Where the ‘old’ Google only allowed us to track the page a keyword was on, the new Google allows us to track the exact position.

The new filters

You can use the first 2 filters mentioned in the old article, be before you do that: create a new profile where you can apply these filters to (tip: watch the video where Joost explains this all):

Filter name: "Ranking 1"
Filter type: "Custom filter - Include"
Filter field: "Campaign Medium"
Filter pattern: "organic"
Filter name: "Ranking 2"
Filter type: "Custom filter - Include"
Filter field: "Campaign Source"
Filter pattern: "google"

And this is the new filter that is capable of tracking positions:

ranking3

And the copy/paste version:

Filter name: "Ranking 3"
Filter type: "Custom filter - Advanced"
Field A -> Extract A: "Campaign term", "(.*)"
Field B -> Extract B: "Referral", "(\?|&)cd=([^&]*)"
Output To -> User Defined: "$A2 (position: $B2)"

And a bonus filter to add an “unknown position” message when the position of the searched keyword is not passed through:

Filter name: "Ranking 4"
Filter type: "Custom filter - Search and Replace"
Filter field: "User Defined"
Search String: "\(position: \)"
Replace String: "(position unknown)"

The new reports

If you have implemented everything correct you should see this in the “Visitors -> User Defined” report:

userdefinedkeywords

A list of keywords with the position the keyword was on when a visitor clicked it. Now you’re able to see the exact positions, more precise than any ranking tool that is out there. There’s 1 minor drawback: business listings next to the little maps are counted as a position also:

mapsresult

The blue result is counted as the 11th result, and not as the first organic result. But when you’re analyzing your positions you can easily separate the geo-related keywords from the rest.

Sitelinks

Very interesting: the sitelinks positions are also tracked, and in a more intelligent way than the maps results. If you click on a sitelink, the actual position of that sitelink is passed on. For example, this sitelink has position 4:

sitelinksranking

If you want to get better insights about your sitelinks you should create an extra profile with the first 3 filters mentioned above. Then add this extra filter to only track those keywords where people clicked on the (full or oneline) sitelinks:

Filter name: "Ranking 5"
Filter type: "Custom filter - Include"
Filter field: "Referral"
Filter pattern: "oi=(oneline_sitelinks|smap)"

The positions you will see are pure sitelinks positions, and you will get an idea about which sitelink is popular and which isn’t.

Extra tip

While we are dissecting the referring url from the Google Search engine we could take a look at the “meta” parameter (my dutch blogpost about this). It’s used when people use one of these options:

googleopties

The selected country or language is in the “meta” parameter (not applicable for Google.com) and can be made visible with the following filter:

Filter name: "Language / Country"
Filter type: "Custom filter - Advanced"
Field A -> Extract A: "Referral", "(\?|&)meta=([^&]*)"
Output To -> User Defined: "$A2"

And remember: do this on a new profile so you don’t mess up existing profiles. The selected language(s) or country is visible in the “Visitors -> User Defined” report.

I had this filter for quite a while on a lot of Dutch sites and saw that the three options where used like this:

  1. The internet: 96,69%
  2. Pages in Dutch: 3,28%
  3. Pages from Holland: 0,03%

Well, that was the update, hope you liked it.

This post was written by: André Scholten

Web Analytics Consultant and SEO specialist at Traffic4u. For more info check my Web Analytics and SEO blog.

See all posts by: André Scholten.