submitting sitemaps to google and yahoo

So… after submitting my xml.jar-czar.com, over a week ago, I had a brief “yippee” as 1 or 2 pages popped in followed by “hmm… that’s really sparse”

Since the core of my application is xml data instead of html data, I attributed the blame to that.

I know, I know… Google and Yahoo both tell you it can take 3-4 week and even then it may not even happen.

So I was bummed… and almost ready to pony up $49 to yahoo with it’s guaranteed refresh… I still may, but let’s face it…

Google is really it for search.

what do you mean by a sitemap?

Before, I thought a sitemap was that lame part of a lot of web pages where they list “all” the pages of interest or all the starting points or some mess.

Well… it’s that too, but it’s also a goofy xml standard:

<urlset>
    <url>
        <loc>http://xml.jar-czar.com/pub/703/424/64a2e92d7df6fa28bf2cb375371628d472/70342464a2e92d7df6fa28bf2cb375371628d472-jar-czar.xml</loc>
        <lastmod>2008-09-08</lastmod>
        <changefreq>weekly</changefreq>
    </url>
</urlset>

That’s just a little taste, but you get the general idea (this is about 1mb of xml).

It turns out the max url’s in a single file is 50k. And I have 62k xml files…

Oh… and my s3 publish freaks out if a files is bigger than 1mb… 😛
Oh… but it can be gzipped…
Oh… well… maybe next time…

Ahem…

So I had to have a sitemapindex:

<sitemapindex>
    <sitemap>
        <loc>http://xml.jar-czar.com/.sitemap.site00000000.xml</loc>
        <lastmod>2008-09-08</lastmod>
    </sitemap>
<sitemapindex>

Each sitemapindex can point to a max of 1k sitemap files which means it can indirectly address up to 50 million files.

And you can have more than 1 if you do have more than 50 million files…

Yeah… That would be a lot…

one last word on sitemaps

If you publish your sitemap as say http://xml.jar-czar.com/site/, it can only refer to files under http://xml.jar-czar.com/site/ (including nested down under any subdirectories) and never say files in http://xml.jar-czar.com/ or http://xml.jar-czar.com/images/

Registering the sitemap with Google

google register sitemap index

Google’s interface (as usual) is spartan and practical. You can add sitemap indexes and then you can view details as far as how many urls are reference, when it was downloaded, how many urls are processed and click to see stats for individual sitemaps.

If someone is really interested, I’ll take some pictures of it…

Registering the sitemap with Yahoo

google register sitemap index

As usual, Yahoo’s SiteExplorer looks really sharp. Kinda makes Google look like it was written in python or something! j/k

But seriously…

So what?

Right? Now I’m sure Google and Yahoo know about my files. That’s still no guarantee they will index them.

At least with Yahoo, I can still do something to make it happen, but then again… it’s Yahoo’s search…

Maybe that’s great if you have a site that sells commemorative plates of Chuck Norris fighting Donkey Kong, but the whole point of jar-czar is to provide a technical resource to Java developers.

Every developer I know uses google….

Now what?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: