Posts Tagged google

pull analytics “site search terms” with google data api

I’d heard about the google data api, but up until recently hadn’t found much practical use for it.

Suddenly I needed to pull back the top search terms for a site monitored by google analytics. The default view only shows 500 at a time. Granted this accounted for 30% of the “unique searches’, but I wanted to see a bit more.

Though it was sort of hard to find, I used the example script dataFeed.sh to retrieve the top 10k search terms.

The script wants you to set the variables for username/password and “PROFILE_ID” The value for this last is the id=xxxx in the url when you cruise around the analytics site.

Rather than read thru the quirky documentation as to what dimensions and metrics were available, I used the Data Feed Query Explorer to find the values instead:

feedUri="https://www.google.com/analytics/feeds/data\
?start-date=2009-08-01\
&end-date=2009-12-04\
&dimensions=ga:searchKeyword\
&metrics=ga:searchDepth,ga:searchDuration,ga:searchExits,ga:searchRefinements,ga:searchUniques,ga:searchVisits\
&sort=-ga:searchUniques\
&max-results=40000\
&ids=ga:$PROFILE_ID\
&prettyprint=true"

curl $feedUri -s --header "Authorization: GoogleLogin $googleAuth"

Despite the max-results of 40k, it looks like there is a max of 10k, which is fine.

All in all, pretty ez. Digging it all up.. sort of sucked.

Hence the write up… hope it helps some buddy!

Advertisements

Leave a Comment

submitting sitemaps to google and yahoo

So… after submitting my xml.jar-czar.com, over a week ago, I had a brief “yippee” as 1 or 2 pages popped in followed by “hmm… that’s really sparse”

Since the core of my application is xml data instead of html data, I attributed the blame to that.

I know, I know… Google and Yahoo both tell you it can take 3-4 week and even then it may not even happen.

So I was bummed… and almost ready to pony up $49 to yahoo with it’s guaranteed refresh… I still may, but let’s face it…

Google is really it for search.

what do you mean by a sitemap?

Before, I thought a sitemap was that lame part of a lot of web pages where they list “all” the pages of interest or all the starting points or some mess.

Well… it’s that too, but it’s also a goofy xml standard:

<urlset>
    <url>
        <loc>http://xml.jar-czar.com/pub/703/424/64a2e92d7df6fa28bf2cb375371628d472/70342464a2e92d7df6fa28bf2cb375371628d472-jar-czar.xml</loc>
        <lastmod>2008-09-08</lastmod>
        <changefreq>weekly</changefreq>
    </url>
</urlset>

That’s just a little taste, but you get the general idea (this is about 1mb of xml).

It turns out the max url’s in a single file is 50k. And I have 62k xml files…

Oh… and my s3 publish freaks out if a files is bigger than 1mb… 😛
Oh… but it can be gzipped…
Oh… well… maybe next time…

Ahem…

So I had to have a sitemapindex:

<sitemapindex>
    <sitemap>
        <loc>http://xml.jar-czar.com/.sitemap.site00000000.xml</loc>
        <lastmod>2008-09-08</lastmod>
    </sitemap>
<sitemapindex>

Each sitemapindex can point to a max of 1k sitemap files which means it can indirectly address up to 50 million files.

And you can have more than 1 if you do have more than 50 million files…

Yeah… That would be a lot…

one last word on sitemaps

If you publish your sitemap as say http://xml.jar-czar.com/site/, it can only refer to files under http://xml.jar-czar.com/site/ (including nested down under any subdirectories) and never say files in http://xml.jar-czar.com/ or http://xml.jar-czar.com/images/

Registering the sitemap with Google

google register sitemap index

Google’s interface (as usual) is spartan and practical. You can add sitemap indexes and then you can view details as far as how many urls are reference, when it was downloaded, how many urls are processed and click to see stats for individual sitemaps.

If someone is really interested, I’ll take some pictures of it…

Registering the sitemap with Yahoo

google register sitemap index

As usual, Yahoo’s SiteExplorer looks really sharp. Kinda makes Google look like it was written in python or something! j/k

But seriously…

So what?

Right? Now I’m sure Google and Yahoo know about my files. That’s still no guarantee they will index them.

At least with Yahoo, I can still do something to make it happen, but then again… it’s Yahoo’s search…

Maybe that’s great if you have a site that sells commemorative plates of Chuck Norris fighting Donkey Kong, but the whole point of jar-czar is to provide a technical resource to Java developers.

Every developer I know uses google….

Now what?

Leave a Comment