Connect to Redshift from R with RPosgreSQL

[ databases  redshift  r  aws  ]

At work, a lot of people use DBVisualizer on Windows computers… This was ok to get up and running, e.g., learning about the various schemas and tables we have in Redshift. But at one point, its utility is lacking: you can’t really do anything with the data without writing it to a CSV file and picking it up in R or Python. So why not cut out the middle man and just query data from R or Python?

Here, I provide some notes I took while figuring it out myself.

You will need Java, e.g., on Linux Ubuntu you should do something like this:

sudo apt-get install openjdk-9-jre
sudo apt-get install libpq-dev

Then, install the rJava and RPostgreSQL packages in R:

install.packages(c('rJava', 'RPostgreSQL'))

Finally, to access:

library(RPostgreSQL)
drv=dbDriver("PostgreSQL")
host='yourRedShiftInstance.redshift.amazonaws.com'
con = dbConnect(drv, host=host_production, 
    port='5439',
    dbname='dbName', 
    user='yourUserName', 
    password='yourPassword')
dbExecute(con, query_to_affect_something) # e.g., create a temp table
dbGetQuery(con, query_something_to_return_results)

Some References

Written on December 22, 2016