Job Scheduling on Posit Connect

Note

Posit Connect allows users to execute R and Python code automiatcally at a cadence of their choosing. In the video below, we’ll describe how to set up job scheduling using R code in a Quarto document that runs an SQL query on a publicly available postgres database!

The Data

RNA Central is a public database of non-coding RNA sequence database from a broad range of organisms. The RNA sequences can be explored and accessed directly on the website, or via a public PostgreSQL database.

The Question

RNA Central contains thousands of sequences and is updated frequently. Our research question is how many sequences does RNA Central contain?

The How

To answer our research question, we will:

  1. Connect to the PostgreSQL database using the DBI and RPostgres packages.
  2. Extract the rna table from the database.
  3. Use the dbplyr and dplyr package to create an SQL query.
  4. Execute the SQL query to count the total number of rows in the rna table.

The Workflow

Install packages

library(DBI)
library(RPostgres)
library(dbplyr)
library(dplyr)

Connect to Public RNA Central Postgres DB

con <- DBI::dbConnect(
  RPostgres::Postgres(),
  dbname = 'pfmegrnargs',
  host = 'hh-pgsql-public.ebi.ac.uk',
  port = 5432,
  user = 'reader',
  password = "NWDMCE5xdipIjRrp" # insert your password here
)

Extract RNA table

rna <- tbl(con, 'rna')

Generate SQL Query to count rows in RNA table

# Generate command to count the total number of rows in table
count_rows <- rna |>
  count()

# Show the SQL query
count_rows |> show_query()

Execute the SQL query

# This takes a few minutes to run!
count_rows |> collect()