The magic of paste0 & sqldf

When I was first transitioning from SAS to R, I really wanted to figure out what the R equivalent to SAS macros would be. While there aren’t explicit “macros” in R, values and functions are very similar and there are many ways to finesse values into getting the job done. One of my favorite ways to do this is utilizing the paste0 function to paste values into a string and then pass that string into another function. This example uses sqldf to paste the value into a SQL query, but this can easily be done with any function that uses a character string input.

This is a game changer if you have multiple SQL queries that use the same start/end dates in one long script, and say part of your workflow is to to change the dates every month to get an output with updated information. Instead of manually sifting through the script and changing out all of the dates, you can change them once at the beginning and have a flawless run through.

Steps

First, we’ll load the 2 packages needed for the tutorial: sqldf and openxlsx. Sqldf allows us to use SQL syntax on R datasets, and openxlsx allows us to import an XLSX file from a URL so we can pull in some sample data. This example uses a dataset on US greenhouse gas emissions by state from 1990 – 2018 from the World Resources Institute. Let’s say we want to pull all of the emissions data from the year 1995 to 2002 for the state of Alabama.

After loading the packages, we’ll set up our values so we can pass them into the sqldf statement:

#load packages 
library(sqldf)
library(openxlsx)

#setup start/end dates & the state we want to look at

start <- '1995'
end <- '2002'
state <- 'Alabama'

Next, we can import the data directly from the World Resources Institute website using the read.xlsx function from the openxlsx package. I chose to separate out the URL from the read.xlsx function, but you can keep them altogether if you prefer.

#import example data
url <- "http://datasets.wri.org/dataset/005e156f-cf77-42e2-a151-8f1f53e4f009/resource/2b403bc7-0874-482b-a1a4-2ead1d5452a9/download/climatewatch-usemissions"
emissions <- read.xlsx(url, startRow=4)

And finally, we can combine the paste0 and sqldf functions to query our emissions dataset with the values we set at the beginning of the script. 

emissions_query <- sqldf(paste0("
SELECT * FROM emissions 
WHERE year BETWEEN ",start," AND ",end,"
AND state = '",state,"'
"))

It is important to note the syntax used in the query above. Paste0 basically smushes everything you list between the parentheses (separated with commas) together into one string, so if you forget a quotation mark or comma there will be errors. For reference, the output of the paste0/SQL statement looks like this:

"SELECT * FROM emissions 
WHERE year BETWEEN 1995 AND 2002 
AND state = 'Alabama'"

So that is what is passed into sqldf() to give us the emissions_query output we want.

Full Example

The full code can be found below, try it for yourself and report back with your own fun use cases for this powerful duo!

#the magic of paste0 & sqldf 

#load packages 
library(sqldf) 
library(openxlsx) 

#setup start/end dates & the state we want to 
#look at 
start <- '1995' 
end <- '2002' 
state <- 'Alabama' 

#import example data from the World Resources 
#Institute on U.S. States Greenhouse Gas Emissions 
url <- "http://datasets.wri.org/dataset/005e156f-cf77-42e2-a151-8f1f53e4f009/resource/2b403bc7-0874-482b-a1a4-2ead1d5452a9/download/climatewatch-usemissions" 
emissions <- read.xlsx(url, startRow=4) 

#use paste0 & sqldf to utilize values that were 
#set at the beginning of the script 
emissions_query <- sqldf(paste0(" 
SELECT * FROM emissions 
WHERE year BETWEEN ",start," AND ",end," 
AND state = '",state,"' "))

Check out the related posts below for more tips!

Leave a Comment

Your email address will not be published. Required fields are marked *