library(DBI)
library(RPostgres)
library(dbplyr)
library(dplyr)
Job Scheduling on Posit Connect
Note
Posit Connect allows users to execute R and Python code automiatcally at a cadence of their choosing. In the video below, we’ll describe how to set up job scheduling using R code in a Quarto document that runs an SQL query on a publicly available postgres database!
The Data
RNA Central is a public database of non-coding RNA sequence database from a broad range of organisms. The RNA sequences can be explored and accessed directly on the website, or via a public PostgreSQL database.
The Question
RNA Central contains thousands of sequences and is updated frequently. Our research question is how many sequences does RNA Central contain?
The How
To answer our research question, we will:
- Connect to the PostgreSQL database using the
DBI
andRPostgres
packages. - Extract the
rna
table from the database. - Use the
dbplyr
anddplyr
package to create an SQL query. - Execute the SQL query to count the total number of rows in the
rna
table.
The Workflow
Install packages
Connect to Public RNA Central Postgres DB
<- DBI::dbConnect(
con ::Postgres(),
RPostgresdbname = 'pfmegrnargs',
host = 'hh-pgsql-public.ebi.ac.uk',
port = 5432,
user = 'reader',
password = "NWDMCE5xdipIjRrp" # insert your password here
)
Extract RNA table
<- tbl(con, 'rna') rna
Generate SQL Query to count rows in RNA table
# Generate command to count the total number of rows in table
<- rna |>
count_rows count()
# Show the SQL query
|> show_query() count_rows
Execute the SQL query
# This takes a few minutes to run!
|> collect() count_rows