Tutorial connection to FinnGen BQ tables using FGconnectionHandler
tutorial_connection_handler.Rmd
Configuration
Similarly to fg_get_cdm_config
(see vignette), we can
use fg_get_bq_config
to produce the connection
configuration based on the sandbox number, and the data Freeze you want
to use.
- environment: You can find your sandbox number by looking at the URL in your browser when connected to sandbox.
- dataFreezeNumber: At the time of writing data freeze can be 6, 7, 8, 9, 10, 11, and 12, data freeze.
config <- fg_get_bq_config(
environment = "sandbox-6",
dataFreezeNumber = 12
)
#> Set option sqlRenderTempEmulationSchema = 'fg-production-sandbox-6.sandbox'
Create FGconnectionHandler object
For convenience one can use function
create_fg_connection_handler_FromList
to generate a
FGconnectionHandler
object with the connection
configuration from fg_get_bq_config
.
FGconnectionHandler <- create_fg_connection_handler_FromList(config)
#> Connecting using BigQuery driver
#> Created a temporary table named #cars
#> Inserting data took 0.00915 secs
#> Dropping tables 'r08qgidpcars' from schema 'atlas-development-270609.sandbox'.
#> | | | 0% | |=================================== | 50% | |======================================================================| 100%
#> Executing SQL took 1.99 secs
FGconnectionHandler
is an object with that manage the
connection to the FinnGen tables in BQ.
First thing to do is to check the status of the connection.
FGdb$connectionStatusLog
not only shows if the connection
is correct, but also if the tempEmulationSchema
works, and
all tables in the sandboxToolsSchema and medicalCodesSchema scheme,
FGconnectionHandler$connectionStatusLog
#> # A tibble: 4 × 4
#> databaseName type step message
#> <chr> <fct> <chr> <chr>
#> 1 atlasDevelopment SUCCESS Check database connection Valid connection
#> 2 atlasDevelopment SUCCESS Check temp table creation can create temp tables
#> 3 atlasDevelopment SUCCESS sandboxToolsSchema connection Connected to tables: b…
#> 4 atlasDevelopment SUCCESS medicalCodesSchema connection Connected to tables: f…
It checks if the connection is established, if it is possible to create temporary tables, and gets a list of all the tables in the sandboxToolsSchema and medicalCodesSchema scheme.
Using FGconnectionHandler
FGdb$connectionStatusLog
contains a list of the existing
table in the sandboxToolsSchema and medicalCodesSchema scheme. It is
recommended to use the latest version of these tables.
FGconnectionHandler$getTblsandboxToolsSchema |> names()
#> [1] "birth_mother_r12_v1"
#> [2] "code_counts_r12_v1"
#> [3] "code_prevalence_stratified_r12_v1"
#> [4] "drug_events_r12_v1"
#> [5] "endpoint_cohorts_r12_v1"
#> [6] "finngen_r12_service_sector_detailed_longitudinal_v1"
#> [7] "kanta_medication_delivery_r12_v1"
#> [8] "kanta_prescription_r12_v1"
#> [9] "kanta_r12_v1"
#> [10] "kidney_r12_v1"
#> [11] "minimum_extended_r12_v1"
#> [12] "vision_r12_v1"
Tables in the sandboxToolsSchema andmedicalCodesSchema can be used as if they were tibbles thanks to dbplyr.
# get events with code "J45" in INPAT
FGconnectionHandler$getTblsandboxToolsSchema$finngen_r12_service_sector_detailed_longitudinal_v1() |>
dplyr::filter(code1 == "J45" & source == "INPAT")
#> Note: method with signature 'DBIConnection#SQL' chosen for function 'dbQuoteIdentifier',
#> target signature 'DatabaseConnectorJdbcConnection#SQL'.
#> "DatabaseConnectorConnection#character" would also be valid
#> # Source: SQL [?? x 16]
#> # Database: DatabaseConnectorJdbcConnection
#> # ℹ 16 variables: finngenid <chr>, source <chr>, event_age <dbl>,
#> # approx_event_day <date>, code1 <chr>, code2 <chr>, code3 <chr>,
#> # code4 <chr>, code5 <chr>, code6 <chr>, code7 <chr>, code8 <chr>,
#> # code9 <chr>, icdver <chr>, category <chr>, index <chr>
At the moment of writing does not allow for the use of
str_detect
# get events with code "J45" and descendants
FGconnectionHandler$getTblsandboxToolsSchema$finngen_r12_service_sector_detailed_longitudinal_v1() |>
dplyr::filter(stringr::str_detect(code1, "^J45"))
#> Error in `stringr::str_detect()`:
#> ! Only fixed patterns are supported on database backends.
A work around is to inject a sql statement instead (notice that this SQL is not based in BQ, but in SqlRender dialect based mostly in MSsqlServer)
# get events with code "J45" and descendants
FGconnectionHandler$getTblsandboxToolsSchema$finngen_r12_service_sector_detailed_longitudinal_v1() |>
dplyr::filter(dplyr::sql("code1 LIKE 'J45%'"))
#> # Source: SQL [?? x 16]
#> # Database: DatabaseConnectorJdbcConnection
#> finngenid source event_age approx_event_day code1 code2 code3 code4 code5
#> <chr> <chr> <dbl> <date> <chr> <chr> <chr> <chr> <chr>
#> 1 FG00000014 OUTPAT 35.8 2019-01-04 J450 NA NA NA NA
#> 2 FG00000025 OUTPAT 43.9 2014-08-15 J450 NA NA NA 93
#> 3 FG00000094 PRIM_OUT 78.5 2020-04-24 J45 NA NA NA R10
#> 4 FG00000163 OUTPAT 38.3 2021-04-04 J459 NA NA NA NA
#> 5 FG00001268 OUTPAT 68.5 2019-01-22 J459 NA NA NA NA
#> 6 FG00001965 OUTPAT 78.5 2021-01-27 J458 NA NA NA NA
#> 7 FG00002117 OUTPAT 68 2020-09-08 J459 NA NA NA NA
#> 8 FG00002304 OUTPAT 53.1 2016-08-09 J459 NA NA NA 93
#> 9 FG00001123 DEATH 72.4 2000-12-22 J459 NA NA NA NA
#> 10 FG00001762 DEATH 76.6 1998-12-30 J459 NA NA NA NA
#> # ℹ more rows
#> # ℹ 7 more variables: code6 <chr>, code7 <chr>, code8 <chr>, code9 <chr>,
#> # icdver <chr>, category <chr>, index <chr>
As an example we can plot the age of smoking status of all the subjects with an asthma diagnose:
We get the asthma subjects by filtering the service sector data, and
join the finngen_info
data to get the smoke2, smoke3
FGconnectionHandler$getTblsandboxToolsSchema$finngen_r12_service_sector_detailed_longitudinal_v1() |>
dplyr::filter(dplyr::sql("code1 LIKE 'J45%'"))|>
dplyr::distinct(finngenid) |>
dplyr::left_join(
FGconnectionHandler$getTblsandboxToolsSchema$minimum_extended_r12_v1(),
by = "finngenid"
) |>
dplyr::count(smoke2, smoke3, sort = T)
#> # Source: SQL [?? x 3]
#> # Database: DatabaseConnectorJdbcConnection
#> # Ordered by: desc(n)
#> smoke2 smoke3 n
#> <chr> <chr> <dbl>
#> 1 no NA 43697
#> 2 NA NA 35806
#> 3 no never 32092
#> 4 NA never 26376
#> 5 no current 16633
#> 6 yes NA 15056
#> 7 no former 14323
#> 8 NA current 13433
#> 9 NA former 11891
#> 10 yes never 11303
#> 11 yes current 5729
#> 12 yes former 5122