Posts Tagged S3


Well, if you read some of my other yammering, you know I’ve been doing a lot of publishing to S3 lately. I tried a number of tools, but none of them did exactly what I wanted:

  • recursively upload files
  • push corrent mime type
  • make everything world readable
  • ignore CVS directories

I do a lot of Java mess, so Java seemed like a fitting choice… I had the s3-example-libraries-1.0.0.jar and the whole gestalt is kinda Java-ey and why am I apologizing?


Java is a perfectly respectable language. It’s not like I wrote it in perl!

jar-czar-publisher expect to find a file called that looks something like:

key    = <put your key here>
secret = <put you secret here>
bucket = <name of your bucket>
email  = <your email address>
source = publish

To make things easy, you can use bin/ to kick everything off after running “mvn package“.

The application will recurse all the files under publish and push them up to your bucket. For example:

                         xsl/headers.xsl ->                  publish/xsl/headers.xsl
                         xsl/footers.xsl ->                  publish/xsl/footers.xsl
                            xsl/user.xsl ->                     publish/xsl/user.xsl
                          xsl/search.xsl ->                   publish/xsl/search.xsl

Now the url “; will pull back the file from “publish/xsl/headers.xsl”.

To keep things easy-peasy, I leave the stuff where ever it is on disk and link it’s directory in publish ala:

% mkdir publish
% cd publish
% ln -s ${HOME}/tmp/somejunk 
% cd -
% mvn package
% vim
% ./bin/

using that zip

That contains the source code… Speaking more accurately, it contains the CVS directory. In order to use it you’ll need to setup a local cvs repository.

Just in case you haven’t done this, shame on you! Everything you write should go into CVS!


These instructions are for bash… Running ’em in other shells is left as an excercise for the reader… 😛

% mkdir -p /tmp/cvs/a-go-go
% cd ${_}
% export CVSROOT=${PWD}
% cvs init 
% wget
% unzip
% mkdir -p /tmp/some/more/dumb/junk/from/brian
% cd ${_}
% cvs co jar-czar-publisher
% cd ${_}
% echo come on.. I think you know what to do from here...

If you don’t…

% mkdir -p ${HOME}/.m2/repository/com/amazon/s3/s3-example-libraries/1.0.0/
% cd ${_}
% wget
% cd -
% mvn package
% echo see above

a few things about a few things

OK, I hate to explain a joke, but let me say something about some of that… “${_}” is a bash-ism that means “whatever the last argument of the previous command.” Got it off a good friend of mine name of John Taylor.

The other “cd -” just means change directory back to wherever I just came from.

a word about mime types

I really don’t like mime types. I think they are dumb. Ever use the file command? That’s smart! A list of magic keys to identify file type. Heck, even guessing by file extensions is smarter!

I always wondered why there was no Java equivalent of the file command, but I have things to do, so I went with the still pretty dumb solution of file extension.

It’s just a property file lookup. I made if from /etc/mime.types like so:

% grep -v '#' /etc/mime.types | awk '{ \
    for ( i = 2 ; i <= NF ; i++ ) printf( "%-40s = %s\n", $i, $1 ); \
}' >
% wc -l src/main/resources/
453 seems pretty handy….

Yup, just like that. I used to think I was pretty good at shell scripting, but it always ends up with being pretty ok with awk. I’ve written some fancier awk, believe you me.

If you really care:

public abstract class Util {
    public static final Properties MIME = new Properties();
    static {
        try {
            MIME.load( new FileInputStream( "src/main/resources/" ) );
        } catch( Exception e ) {

    public static String mime( String file ) {
        String mime = MIME.getProperty( file.replaceAll( ".*\\.", ""  ) );
        return null == mime ? DEFAULT_MIME_TYPE : mime;

    public static String mime( File file ) {
        return Util.mime( file.toString() );

bit of s3 naming advice

I really wish someone had pointed this out earlier. If you want to use S3 to host stuff for your website:

name your bucket something from your domain

For example: ”” and then point a cname at “”. That way, when you want to give people a URL, you can give them instead of


Leave a Comment

software as search

Brave New World

The rules of the development are changing. In the not so olde days, ambitious programmers or even groups of programmers came to the inevitable stumbling block: where to host the application?

Making the leap from “hosting it at a friends house” or the “server in the basement” to 24/7 real internet hoo-ha used to involve plunking down $100+ a month to rack a machine in a colo or some rot.

Well now there’s a real answer: Amazon S3 + Google’s Application Engine

OK, well at least the S3 part…

My Toy App

My original toy idea was to create a catalog of all the classes in all the jar files I care about. I have a local application that I use that I find pretty nifty:

  • run over my local ${HOME}/.m2/repository, glassfish, etc and find every jar
  • jar tf | sed “s,^,${path_to_jar}:,” >> ~/.jar_minder_file.txt
  • grep someclass ~/.jar_minder_file.txt

Originally, that was going to be my GAE app: upload my 62mb txt file, make it searchable, the end… but…

I ran into a lot of issues… bulkloading issues and search issues… most of which I did eventually overcome, but it made me change my application model considerably.

I ❤ grep.

I do. I’m not ashamed of it and yes, I have told my parents about it… They used to work at AT&T pre-split, so they grokked it to a certain extent.

Grep like Google

But I thought, I also ❤ Google. I thought the bulkload+keyword thing would get it, but once that 62mb was indexed… well… let’s say it grew an ittle bit… an ittle bit beyond the 500mb of free GAE space.

How fortunate you are not to have been in the room as I stomped my little feet and shook my mad fists in impotent rage!

Google wouldn’t let me search my 62mb text file… It was too big… then I thought… but Google lets’ me search that internet thing! What if I put my file in the so-called “internet” and then Google crawled it and when I wanted to do a search I could Google up: someclass site:somesite

Instead of one big txt file, I thought I could generate a buncha little xml files. That way every search wouldn’t return the same 62mb txt file and I could just handle the presentation layer with XSL + JS and bada-boom: search application.

First I tried GAE.. but even my toy set of jars was too much! Did I mention GAE has a 1,000 file limit? Yeah… no one mentioned it to me either!

OK.. no need to flail my manhooks at the ether! I knew a guy who had been telling me about how I was being all dopey not to take advantage of all the S3 cheapness that was out there.

Change of plans

Instead of trying to python everything myself( oh, didn’t mention GAE only runs python? yeah… it’s not that bad, but then I say that about XSLT)… I decide I would host it on cheapo S3 and put the “smart” bits on GAE.

The search part would then just be a matter of getting the site indexed (still pending as of today).

Ironically, Yahoo’s BOSS works really well on GAE… but once again… still waiting for it to be indexed.

The smart bits

Of course, that’s all well and good, but I wanted to put a little something+something on top.

GAE has nice user hooks, so I borrowed the idea of “starring” from netflix to let logged in users mark jar’s they like. If it gets off the ground, I think ultimately that is going to be the really interesting part.

I’d also like to expand to not just cover jar contents but actually javap up the classes… Move beyond resolving classpath issues into a fullblown method signature to implementation resolution… Track API changes across releases from every OSS Java project…

Lot’s of possibilities… for now… gotta wait for it to get indexed…

Of course, if Google / Yahoo don’t hack it… I’m not just going to give up… after all… there’s always EC2! and I have a couple of pals named JackRabbit and Lucene who enjoy kicking some ass!


Comments (2)

Amazon S3 Library for SOAP in Java


After trying to run the code based on this blog, it suddenly just doesn’t work!

Recommend you look at JetS3t which provides a nice wrapper around the S3 SOAP stuff.

The main reason not to the S3 SOAP directly is because signing requests is a complete pain in the ass.

Woe is me!

After looking a lot of tools, I decided I need to write a one-off for my application to update ACL’s. All the tools I found wanted to pull back every ACL and update it! I want all my keys in this part of my bucket to have the same ACL.

I tried to use the wsimport’d library, but…. The SOAP 1.1 request is missing a security element
    at $Proxy33.getObjectAccessControlPolicy(Unknown Source)
    at us.versus.them.jarczar.publisher.AppTest.testApp(

OSS to the Rescue

I cursed and cried and then found the Amazon S3 Library for SOAP in Java. The only bit I really wanted was which takes care of the blasted authentication nonsense.

Compiling s3-example-libraries

Compiling it was fun:

% javac -d out $( find com -name "*.java" ) -classpath ${HOME}/.m2/repository/org/codehaus/castor/castor/1.2/castor-1.2.jar:${HOME}/.m2/repository/org/apache/axis/axis/1.4/axis-1.4.jar:${HOME}/.m2/repository/com/sun/javaee/javaee/5.0/javaee-5.0.jar
% jar cf  s3-example-libraries.jar -C out .
% 1.0.0 s3-example-libraries.jar

Using it

Using it was trivial:

import com.amazonaws.s3.doc._2006_03_01.*;


AWSAuthConnection aws = new AWSAuthConnection( awsAccessKeyId, awsSecretAccessKey );
AccessControlPolicy acl = aws.getACL( bucket, key );

Maven Dependencies

It did pull in a few deps....


Leave a Comment

so much free stuff!!!

Yeah… I know… I was doing stuff, but then I went totally nuts on Google’s App Engine!


Just wanted to say: go and get at it! Sure it is suck ass pythong, but at least it’s not perl.

What else? S3 is what else?

Waiting on YAP to drop. I’m out…

Parting shot:

% mkdir amazonS3
% wsimport -d amazonS3 -s amazonS3 
% javadoc -d javadoc $( find amazonS3 -name "*.java" )
% jar cf amazonS3.jar -C amazonS3 .


Yeah… let me try to write something useful…

Here is a good tip on managing your own key on bulkload with GAE.

Here is what I got (names changed to keep my sh!t on the d/l):

from google.appengine.ext import db
from google.appengine.ext import bulkload
from google.appengine.api import datastore
from google.appengine.api import datastore_types
from google.appengine.ext import search

class LoadMyJunk( bulkload.Loader ):
    def __init__(self):
            , 'SomeJunk'
            , [
                  ( 'sha1',    str )
                , ( 'pwd',     str )
                , ( 'filesize', int )
                , ( 'datal',    db.Text )
    def HandleEntity( self, entity ):
        name = 's' + entity[ 'sha1' ] 
        newent = datastore.Entity( 'SomeJunk', name=name )
        newent.update( entity )
        ent = search.SearchableEntity( newent )
        return ent

if __name__ == '__main__':
    bulkload.main( LoadMyJunk() )

I would say how neat it is, but I am too busy kicking ass. STOP Suggest you do same STOP


Client/Server is dead! Long live, commodity computing!

Power to the people!


Leave a Comment