Skip to contents

Configuration

Similarly to fg_get_cdm_config (see vignette), we can use fg_get_bq_config to produce the connection configuration based on the sandbox number, and the data Freeze you want to use.

  • environment: You can find your sandbox number by looking at the URL in your browser when connected to sandbox.

  • dataFreezeNumber: At the time of writing data freeze can be 6, 7, 8, 9, 10, 11, and 12, data freeze.
 config <- fg_get_bq_config(
   environment = "sandbox-6",
   dataFreezeNumber = 12
   )
#> Set option sqlRenderTempEmulationSchema = 'fg-production-sandbox-6.sandbox'

Create FGconnectionHandler object

For convenience one can use function create_fg_connection_handler_FromList to generate a FGconnectionHandler object with the connection configuration from fg_get_bq_config.

FGconnectionHandler <- create_fg_connection_handler_FromList(config)
#> Connecting using BigQuery driver
#> Created a temporary table named #cars
#> Inserting data took 0.00915 secs
#> Dropping tables 'r08qgidpcars' from schema 'atlas-development-270609.sandbox'.
#>   |                                                                              |                                                                      |   0%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================================================| 100%
#> Executing SQL took 1.99 secs

FGconnectionHandler is an object with that manage the connection to the FinnGen tables in BQ.

First thing to do is to check the status of the connection. FGdb$connectionStatusLog not only shows if the connection is correct, but also if the tempEmulationSchema works, and all tables in the sandboxToolsSchema and medicalCodesSchema scheme,

FGconnectionHandler$connectionStatusLog
#> # A tibble: 4 × 4
#>   databaseName     type    step                          message                
#>   <chr>            <fct>   <chr>                         <chr>                  
#> 1 atlasDevelopment SUCCESS Check database connection     Valid connection       
#> 2 atlasDevelopment SUCCESS Check temp table creation     can create temp tables 
#> 3 atlasDevelopment SUCCESS sandboxToolsSchema connection Connected to tables: b…
#> 4 atlasDevelopment SUCCESS medicalCodesSchema connection Connected to tables: f…

It checks if the connection is established, if it is possible to create temporary tables, and gets a list of all the tables in the sandboxToolsSchema and medicalCodesSchema scheme.

Using FGconnectionHandler

FGdb$connectionStatusLog contains a list of the existing table in the sandboxToolsSchema and medicalCodesSchema scheme. It is recommended to use the latest version of these tables.

FGconnectionHandler$getTblsandboxToolsSchema |> names()
#>  [1] "birth_mother_r12_v1"                                
#>  [2] "code_counts_r12_v1"                                 
#>  [3] "code_prevalence_stratified_r12_v1"                  
#>  [4] "drug_events_r12_v1"                                 
#>  [5] "endpoint_cohorts_r12_v1"                            
#>  [6] "finngen_r12_service_sector_detailed_longitudinal_v1"
#>  [7] "kanta_medication_delivery_r12_v1"                   
#>  [8] "kanta_prescription_r12_v1"                          
#>  [9] "kanta_r12_v1"                                       
#> [10] "kidney_r12_v1"                                      
#> [11] "minimum_extended_r12_v1"                            
#> [12] "vision_r12_v1"

Tables in the sandboxToolsSchema andmedicalCodesSchema can be used as if they were tibbles thanks to dbplyr.

# get events with code "J45" in INPAT
FGconnectionHandler$getTblsandboxToolsSchema$finngen_r12_service_sector_detailed_longitudinal_v1() |> 
  dplyr::filter(code1 == "J45" & source == "INPAT")
#> Note: method with signature 'DBIConnection#SQL' chosen for function 'dbQuoteIdentifier',
#>  target signature 'DatabaseConnectorJdbcConnection#SQL'.
#>  "DatabaseConnectorConnection#character" would also be valid
#> # Source:   SQL [?? x 16]
#> # Database: DatabaseConnectorJdbcConnection
#> # ℹ 16 variables: finngenid <chr>, source <chr>, event_age <dbl>,
#> #   approx_event_day <date>, code1 <chr>, code2 <chr>, code3 <chr>,
#> #   code4 <chr>, code5 <chr>, code6 <chr>, code7 <chr>, code8 <chr>,
#> #   code9 <chr>, icdver <chr>, category <chr>, index <chr>
(same table in html format for exploration)

At the moment of writing does not allow for the use of str_detect

# get events with code "J45" and descendants
FGconnectionHandler$getTblsandboxToolsSchema$finngen_r12_service_sector_detailed_longitudinal_v1() |> 
  dplyr::filter(stringr::str_detect(code1, "^J45"))
#> Error in `stringr::str_detect()`:
#> ! Only fixed patterns are supported on database backends.

A work around is to inject a sql statement instead (notice that this SQL is not based in BQ, but in SqlRender dialect based mostly in MSsqlServer)

# get events with code "J45" and descendants
FGconnectionHandler$getTblsandboxToolsSchema$finngen_r12_service_sector_detailed_longitudinal_v1() |> 
  dplyr::filter(dplyr::sql("code1 LIKE 'J45%'"))
#> # Source:   SQL [?? x 16]
#> # Database: DatabaseConnectorJdbcConnection
#>    finngenid  source   event_age approx_event_day code1 code2 code3 code4 code5
#>    <chr>      <chr>        <dbl> <date>           <chr> <chr> <chr> <chr> <chr>
#>  1 FG00000014 OUTPAT        35.8 2019-01-04       J450  NA    NA    NA    NA   
#>  2 FG00000025 OUTPAT        43.9 2014-08-15       J450  NA    NA    NA    93   
#>  3 FG00000094 PRIM_OUT      78.5 2020-04-24       J45   NA    NA    NA    R10  
#>  4 FG00000163 OUTPAT        38.3 2021-04-04       J459  NA    NA    NA    NA   
#>  5 FG00001268 OUTPAT        68.5 2019-01-22       J459  NA    NA    NA    NA   
#>  6 FG00001965 OUTPAT        78.5 2021-01-27       J458  NA    NA    NA    NA   
#>  7 FG00002117 OUTPAT        68   2020-09-08       J459  NA    NA    NA    NA   
#>  8 FG00002304 OUTPAT        53.1 2016-08-09       J459  NA    NA    NA    93   
#>  9 FG00001123 DEATH         72.4 2000-12-22       J459  NA    NA    NA    NA   
#> 10 FG00001762 DEATH         76.6 1998-12-30       J459  NA    NA    NA    NA   
#> # ℹ more rows
#> # ℹ 7 more variables: code6 <chr>, code7 <chr>, code8 <chr>, code9 <chr>,
#> #   icdver <chr>, category <chr>, index <chr>
(same table in html format for exploration)

As an example we can plot the age of smoking status of all the subjects with an asthma diagnose:

We get the asthma subjects by filtering the service sector data, and join the finngen_info data to get the smoke2, smoke3

FGconnectionHandler$getTblsandboxToolsSchema$finngen_r12_service_sector_detailed_longitudinal_v1() |> 
  dplyr::filter(dplyr::sql("code1 LIKE 'J45%'"))|> 
  dplyr::distinct(finngenid) |> 
  dplyr::left_join(
    FGconnectionHandler$getTblsandboxToolsSchema$minimum_extended_r12_v1(), 
    by = "finngenid"
  ) |> 
  dplyr::count(smoke2, smoke3, sort = T)
#> # Source:     SQL [?? x 3]
#> # Database:   DatabaseConnectorJdbcConnection
#> # Ordered by: desc(n)
#>    smoke2 smoke3      n
#>    <chr>  <chr>   <dbl>
#>  1 no     NA      43697
#>  2 NA     NA      35806
#>  3 no     never   32092
#>  4 NA     never   26376
#>  5 no     current 16633
#>  6 yes    NA      15056
#>  7 no     former  14323
#>  8 NA     current 13433
#>  9 NA     former  11891
#> 10 yes    never   11303
#> 11 yes    current  5729
#> 12 yes    former   5122
(same table in html format for exploration)