If you have been following along on my blog, you have seen the various technical and other ramblings for my recent R&D efforts around “Big Data”. A little while ago I wrote about giving Amazon a try which I still believe is true… and long those lines, am now giving Amazon Redshift (their new entry into the big data market) a test-drive. Air-Finger quotes around “Big Data” … that in itself is another blog-post entirely … however, for my purpose here, lets just call it data that is larger in volume and number of rows than can easily be stored in a single traditional database instance. Basically, I have found that Hadoop will do a fantastic job for large volumes of data, essentially “at rest”, not a bad thing really, particularly if you are looking to keep logs of particular events, or other high-volume generating activities of things that will not change or update over time. That all being said, the next stone I am picking up is Redshift. I am not sure how many nodes will be needed in the cluster to make things go, but that really is what this exploration is about… what will it take to make it go… I will have some blog posts about our “adventures with Redshift”, but for now… let’s get setup.
So my initial setup is with the on-demand pricing structure… honestly… think there should be some better incentive from Amazon to “try before you commit” with some spot pricing… I cannot imagine you would run on-demand for a long term solution (especially given the effort of implementing a solution), all that said, I am starting a single on-demand XL node ( 2 TB Disk, 2 core, 15 GB memory).
The setup, as with most of the Amazon products, is relatively straight forward. A few clicks, a few questions, an encryption key and you are pretty much ready. The default install of the product (which is not a bad thing) does not allow access to anything from anywhere, so anything you want to let in, you need to make a conscious decision. So some setup for the Redshift specific security groups, and you should be good to go. (After you install some client tools…) I can say however, as you are using a postgres back-end, there are better tools available than what Amazon suggests. Personally, I have chosen to use the EMS freeware version of their SQL Manager product. You can make your own choice on that… but that is what I ended up with .
Take a look at the set-by-step instructions if you have questions, really no need for me to paste all that here, if you have made it this far, you are probably have a pretty good understanding of what you are trying to get to.
As a tease to my next blog posts… I will get into the loading of data (what we learned), and how things are performing so far, etc.