submitting sitemaps to google and yahoo

So… after submitting my, over a week ago, I had a brief “yippee” as 1 or 2 pages popped in followed by “hmm… that’s really sparse”

Since the core of my application is xml data instead of html data, I attributed the blame to that.

I know, I know… Google and Yahoo both tell you it can take 3-4 week and even then it may not even happen.

So I was bummed… and almost ready to pony up $49 to yahoo with it’s guaranteed refresh… I still may, but let’s face it…

Google is really it for search.

what do you mean by a sitemap?

Before, I thought a sitemap was that lame part of a lot of web pages where they list “all” the pages of interest or all the starting points or some mess.

Well… it’s that too, but it’s also a goofy xml standard:


That’s just a little taste, but you get the general idea (this is about 1mb of xml).

It turns out the max url’s in a single file is 50k. And I have 62k xml files…

Oh… and my s3 publish freaks out if a files is bigger than 1mb… 😛
Oh… but it can be gzipped…
Oh… well… maybe next time…


So I had to have a sitemapindex:


Each sitemapindex can point to a max of 1k sitemap files which means it can indirectly address up to 50 million files.

And you can have more than 1 if you do have more than 50 million files…

Yeah… That would be a lot…

one last word on sitemaps

If you publish your sitemap as say, it can only refer to files under (including nested down under any subdirectories) and never say files in or

Registering the sitemap with Google

google register sitemap index

Google’s interface (as usual) is spartan and practical. You can add sitemap indexes and then you can view details as far as how many urls are reference, when it was downloaded, how many urls are processed and click to see stats for individual sitemaps.

If someone is really interested, I’ll take some pictures of it…

Registering the sitemap with Yahoo

google register sitemap index

As usual, Yahoo’s SiteExplorer looks really sharp. Kinda makes Google look like it was written in python or something! j/k

But seriously…

So what?

Right? Now I’m sure Google and Yahoo know about my files. That’s still no guarantee they will index them.

At least with Yahoo, I can still do something to make it happen, but then again… it’s Yahoo’s search…

Maybe that’s great if you have a site that sells commemorative plates of Chuck Norris fighting Donkey Kong, but the whole point of jar-czar is to provide a technical resource to Java developers.

Every developer I know uses google….

Now what?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: