USING FGconnectionHandler: Adding names to longitudinal and service sector codes
tutorial_add_info_FGconnectionHandler.Rmd
Intro
This tutorial show how to append additional information to a
longitudinal or service-sector table using dplyr
package
and the FGconnectionHandler
object in the
FinnGenUtilsR
package.
Information to add includes, name of the medical codes in English, name for the type of visit, name for the type of provider.
To understand what is the FGconnectionHandler
is
recommended to see the dedicate vignette
tutorial_connection_handler
.
This tutorial also included how to append additional information to the Kanta lab data. In particular, the name of the OMOP concept for the lab test.
Set up
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(FinnGenUtilsR)
This tutorial has been developed using dummy data under the Atlas development project.
config <- fg_get_bq_config(
environment = "atlasDevelopment",
dataFreezeNumber = 12
)
#> Set option sqlRenderTempEmulationSchema = 'atlas-development-270609.sandbox'
To execute it in Sandbox, be sure to change the configuration to fit your needs.
config <- fg_get_bq_config(
environment = "sandbox-6", # set your sandbox number
dataFreezeNumber = 12 # set the data freeze you want to use
)
Create a FGconnectionHandler
object with the connection
configuration from fg_get_bq_config
.
FGconnectionHandler <- create_fg_connection_handler_FromList(config)
#> Connecting using BigQuery driver
#> Created a temporary table named #cars
#> Inserting data took 0.0101 secs
#> Dropping tables 'sscs0frucars' from schema 'atlas-development-270609.sandbox'.
#> | | | 0% | |=================================== | 50% | |======================================================================| 100%
#> Executing SQL took 2.06 secs
We can check if the connection has been successful.
FGconnectionHandler$connectionStatusLog
#> # A tibble: 4 × 4
#> databaseName type step message
#> <chr> <fct> <chr> <chr>
#> 1 atlasDevelopment SUCCESS Check database connection Valid connection
#> 2 atlasDevelopment SUCCESS Check temp table creation can create temp tables
#> 3 atlasDevelopment SUCCESS sandboxToolsSchema connection Connected to tables: b…
#> 4 atlasDevelopment SUCCESS medicalCodesSchema connection Connected to tables: f…
We can access the service sector data using the connection object as follows:
service_sector_tbl <- FGconnectionHandler$getTblsandboxToolsSchema$finngen_r12_service_sector_detailed_longitudinal_v1()
Where service_sector_tbl
is a tbl
object
that can be used with dplyr
functions.
Generate a table in service sector format
For example we create a subset of the main service sector table by filtering the patients with an asthma diagnose (J45) in the INPAT source.
IMPORTANT: this is important to notice that the
tables coming from the FGconnectionHandler
have all the
columns in lower case.
# get events with code "J45" in INPAT
ss_subject_1_tbl <- FGconnectionHandler$getTblsandboxToolsSchema$finngen_r12_service_sector_detailed_longitudinal_v1() |>
dplyr::filter(finngenid == "FG00000001")
We can see that this table does not currently have information about the medical codes:
ss_subject_1_tbl
#> Note: method with signature 'DBIConnection#SQL' chosen for function 'dbQuoteIdentifier',
#> target signature 'DatabaseConnectorJdbcConnection#SQL'.
#> "DatabaseConnectorConnection#character" would also be valid
#> # Source: SQL [?? x 16]
#> # Database: DatabaseConnectorJdbcConnection
#> finngenid source event_age approx_event_day code1 code2 code3 code4 code5
#> <chr> <chr> <dbl> <date> <chr> <chr> <chr> <chr> <chr>
#> 1 FG00000001 OUTPAT 34.5 2010-11-08 X59 NA NA NA 93
#> 2 FG00000001 OUTPAT 41.7 2018-01-16 W19 NA NA NA NA
#> 3 FG00000001 OUTPAT 42.5 2018-11-17 X58 NA NA NA NA
#> 4 FG00000001 OUTPAT 47.0 2023-05-22 M069 NA NA NA NA
#> 5 FG00000001 OUTPAT 47.0 2023-05-22 Z992 NA NA NA NA
#> 6 FG00000001 OUTPAT 47.0 2023-05-22 F4322 NA NA NA NA
#> 7 FG00000001 OUTPAT 47.0 2023-05-22 C5691 NA NA NA NA
#> 8 FG00000001 OUTPAT 47.0 2023-05-22 G564 G564 NA NA NA
#> 9 FG00000001 OUTPAT 47.0 2023-05-22 N3282 NA NA NA NA
#> 10 FG00000001 OPER_IN 16.8 1993-02-18 8511 NA NA 28 NA
#> # ℹ more rows
#> # ℹ 7 more variables: code6 <chr>, code7 <chr>, code8 <chr>, code9 <chr>,
#> # icdver <chr>, category <chr>, index <chr>
SERVICE SECTOR
Adding info to medical codes
Function fg_dbplyr_append_code_info_to_longitudinal_data
adds new columns with information about the medical code used in each
event/row.
We just need to indicate the table with the translations. This is
also available in the FGconnectionHandler
object.
We recommend to take the one with the highest version number.
ss_subject_1_with_translations_tbl <- fg_dbplyr_append_code_info_to_longitudinal_data(
ss_subject_1_tbl,
FGconnectionHandler$getTblmedicalCodesSchema$fg_codes_info_v7()
)
ss_subject_1_with_translations_tbl |>
dplyr::select(finngenid, source, approx_event_day, code1, code2, code3, code, name_en, name_fi, omop_concept_id)
#> # Source: SQL [?? x 10]
#> # Database: DatabaseConnectorJdbcConnection
#> finngenid source approx_event_day code1 code2 code3 code name_en name_fi
#> <chr> <chr> <date> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 FG00000001 OUTPAT 2010-11-08 X59 NA NA X59 Exposur… NA
#> 2 FG00000001 OUTPAT 2018-01-16 W19 NA NA W19 Unspeci… NA
#> 3 FG00000001 OUTPAT 2018-11-17 X58 NA NA X58 Exposur… NA
#> 4 FG00000001 OUTPAT 2023-05-22 M069 NA NA M06.9 Rheumat… NA
#> 5 FG00000001 OUTPAT 2023-05-22 Z992 NA NA Z99.2 Depende… NA
#> 6 FG00000001 OUTPAT 2023-05-22 F4322 NA NA F43.22 Adaptat… Sopeut…
#> 7 FG00000001 OUTPAT 2023-05-22 C5691 NA NA C56.91 Maligna… Munasa…
#> 8 FG00000001 OUTPAT 2023-05-22 G564 G564 NA G56.4 Causalg… NA
#> 9 FG00000001 OUTPAT 2023-05-22 N3282 NA NA N32.82 Instabi… Virtsa…
#> 10 FG00000001 OPER_IN 1993-02-18 8511 NA NA 8511 Cervica… Kaulak…
#> # ℹ more rows
#> # ℹ 1 more variable: omop_concept_id <chr>
If an event/row uses more than one medical code
Some events include more than one medical code, or more than one way
to understand a medical code. Parameters in
fg_dbplyr_append_code_info_to_longitudinal_data
allows to
chose how to name the medical code in this situations.
For example:
- PURCH_map_to = “VNR”, tells to use the VNR code in PURCH register, instead of the default ATC
- ICD10fi_map_to = “CODE1”, tells to use only the first code in ICD10fi, instead of the default CODE1*CODE2 combination code
fg_dbplyr_append_code_info_to_longitudinal_data(
ss_subject_1_tbl,
FGconnectionHandler$getTblmedicalCodesSchema$fg_codes_info_v7(),
PURCH_map_to = "VNR",
ICD10fi_map_to = "CODE1"
) |>
filter(vocabulary_id == "ICD10fi" | vocabulary_id == "VNRfi") |>
dplyr::select(finngenid, source, approx_event_day, code1, code2, vocabulary_id, code, name_en, name_fi, omop_concept_id)
#> # Source: SQL [?? x 10]
#> # Database: DatabaseConnectorJdbcConnection
#> finngenid source approx_event_day code1 code2 vocabulary_id code name_en
#> <chr> <chr> <date> <chr> <chr> <chr> <chr> <chr>
#> 1 FG00000001 OUTPAT 2010-11-08 X59 NA ICD10fi X59 Exposure…
#> 2 FG00000001 OUTPAT 2018-01-16 W19 NA ICD10fi W19 Unspecif…
#> 3 FG00000001 OUTPAT 2018-11-17 X58 NA ICD10fi X58 Exposure…
#> 4 FG00000001 OUTPAT 2023-05-22 M069 NA ICD10fi M06.9 Rheumato…
#> 5 FG00000001 OUTPAT 2023-05-22 Z992 NA ICD10fi Z99.2 Dependen…
#> 6 FG00000001 OUTPAT 2023-05-22 F4322 NA ICD10fi F43.22 Adaptati…
#> 7 FG00000001 OUTPAT 2023-05-22 C5691 NA ICD10fi C56.91 Malignan…
#> 8 FG00000001 OUTPAT 2023-05-22 G564 G564 ICD10fi G56.4 Causalgia
#> 9 FG00000001 OUTPAT 2023-05-22 N3282 NA ICD10fi N32.82 Instabil…
#> 10 FG00000001 OUTPAT 2015-05-25 J450 NA ICD10fi J45.0 Predomin…
#> # ℹ more rows
#> # ℹ 2 more variables: name_fi <chr>, omop_concept_id <chr>
Se the help, for more options
?fg_append_code_info_to_longitudinal_data_sql
Reduce code precision
ICD10fi, ICD9fi, ICD8fi, ATC, and NCSPfi, code systems reflect the hierarchy of the code by the first letters on them.
It is possible to truncate these medical codes before the information is added. This in practice adds the code and names of the parent codes.
fg_dbplyr_append_code_info_to_longitudinal_data(
ss_subject_1_tbl,
FGconnectionHandler$getTblmedicalCodesSchema$fg_codes_info_v7(),
ICD10fi_precision = 3,
ICD9fi_precision = 3,
ICD8fi_precision = 3,
ATC_precision = 3,
NCSPfi_precision = 2
)|>
dplyr::select(finngenid, source, approx_event_day, code1, code2, vocabulary_id, code, name_en, name_fi, omop_concept_id)
#> # Source: SQL [?? x 10]
#> # Database: DatabaseConnectorJdbcConnection
#> finngenid source approx_event_day code1 code2 vocabulary_id code name_en
#> <chr> <chr> <date> <chr> <chr> <chr> <chr> <chr>
#> 1 FG00000001 OUTPAT 2010-11-08 X59 NA ICD10fi X59 Exposure…
#> 2 FG00000001 OUTPAT 2018-01-16 W19 NA ICD10fi W19 Unspecif…
#> 3 FG00000001 OUTPAT 2018-11-17 X58 NA ICD10fi X58 Exposure…
#> 4 FG00000001 OUTPAT 2023-05-22 M069 NA ICD10fi M06 Other rh…
#> 5 FG00000001 OUTPAT 2023-05-22 Z992 NA ICD10fi Z99 Dependen…
#> 6 FG00000001 OUTPAT 2023-05-22 F4322 NA ICD10fi F43 Reaction…
#> 7 FG00000001 OUTPAT 2023-05-22 C5691 NA ICD10fi C56 Malignan…
#> 8 FG00000001 OUTPAT 2023-05-22 G564 G564 ICD10fi G56 Mononeur…
#> 9 FG00000001 OUTPAT 2023-05-22 N3282 NA ICD10fi N32 Other di…
#> 10 FG00000001 OPER_IN 1993-02-18 8511 NA FHL 8511 Cervical…
#> # ℹ more rows
#> # ℹ 2 more variables: name_fi <chr>, omop_concept_id <chr>
We can see in column code
the truncated code and in
name_en
the name of the truncated code.
Adding info to visit type
Function
fg_dbplyr_append_visit_type_info_to_service_sector_data
adds new columns with information about the visit type.
In the service-sector data, information about the visit type is
defined in CODE5 to CODE9 depending on the SOURCE and time period.
fg_dbplyr_append_visit_type_info_to_service_sector_data
abstracts this nuances, and assign one visit type code per row. Notice
that a visit may contain more than one event/row. Events/rows belonging
to the same visit share the same combination of SOURCE+INDEX.
By default,
fg_bq_append_visit_type_info_to_service_sector_data
will
also include two columns is_clinic_visit
and
is_follow_up_visit
that will be TRUE if the visit is a
clinic visit or a follow-up visit, respectively. However, this works
only if fg_codes_info_table
version is v7 or higher. If you
need to use lower versions set the parameters
add_is_clinic_visist
and
add_is_follow_up_visit
to FALSE. See the help for more
details.
fg_dbplyr_append_visit_type_info_to_service_sector_data(
ss_subject_1_tbl,
FGconnectionHandler$getTblmedicalCodesSchema$fg_codes_info_v7()
)|>
dplyr::select(finngenid, source, index, approx_event_day, code5, code6, code8, code9, visit_type_code, visit_type_name_en, is_clinic_visit, is_follow_up_visit)
#> # Source: SQL [?? x 12]
#> # Database: DatabaseConnectorJdbcConnection
#> finngenid source index approx_event_day code5 code6 code8 code9
#> <chr> <chr> <chr> <date> <chr> <chr> <chr> <chr>
#> 1 FG00000001 OUTPAT 305 2010-11-08 93 10E NA NA
#> 2 FG00000001 OUTPAT 258 2018-01-16 NA 30Q R52 E
#> 3 FG00000001 OUTPAT 110 2018-11-17 NA 15Y R10 6
#> 4 FG00000001 OUTPAT 97 2023-05-22 NA 80A R52 E
#> 5 FG00000001 OUTPAT 97 2023-05-22 NA 80A R52 E
#> 6 FG00000001 OUTPAT 97 2023-05-22 NA 80A R52 E
#> 7 FG00000001 OUTPAT 97 2023-05-22 NA 80A R52 E
#> 8 FG00000001 OUTPAT 97 2023-05-22 NA 80A R52 E
#> 9 FG00000001 OUTPAT 97 2023-05-22 NA 80A R52 E
#> 10 FG00000001 OPER_IN 105 1993-02-18 NA 90 NA NA
#> # ℹ more rows
#> # ℹ 4 more variables: visit_type_code <chr>, visit_type_name_en <chr>,
#> # is_clinic_visit <chr>, is_follow_up_visit <chr>
Adding info to provider type
Function
fg_dbplyr_append_provider_info_to_service_sector_data
adds
new columns with information about the personal or unit that provided
the diagnose during the visit.
In the service-sector data, information about the provider is defined
in CODE6 to CODE7 depending on the SOURCE.
fg_dbplyr_append_provider_info_to_service_sector_data
abstracts this nuances, and assign one provider type code per row.
fg_dbplyr_append_provider_info_to_service_sector_data(
ss_subject_1_tbl,
FGconnectionHandler$getTblmedicalCodesSchema$fg_codes_info_v7()
)|>
dplyr::select(finngenid, source, index, approx_event_day, code5, code6, code8, code9, provider_code, provider_name_en, provider_concept_class_id)
#> # Source: SQL [?? x 11]
#> # Database: DatabaseConnectorJdbcConnection
#> finngenid source index approx_event_day code5 code6 code8 code9 provider_code
#> <chr> <chr> <chr> <date> <chr> <chr> <chr> <chr> <chr>
#> 1 FG000000… PURCH 308 1997-10-16 NA NA NA NA NA
#> 2 FG000000… PURCH 269 2016-12-24 NA NA NA NA NA
#> 3 FG000000… PURCH 304 1997-12-12 NA NA NA NA NA
#> 4 FG000000… PURCH 65 2021-12-18 NA NA NA NA NA
#> 5 FG000000… PRIM_… 265 2013-04-10 R20 T41 NA NA NA
#> 6 FG000000… PURCH 140 2007-07-20 NA NA NA NA NA
#> 7 FG000000… PRIM_… 250 2022-08-28 R20 T40 NA NA NA
#> 8 FG000000… PURCH 166 2010-01-20 NA NA NA NA NA
#> 9 FG000000… OUTPAT 177 2017-11-03 2 55 NA NA 55
#> 10 FG000000… PRIM_… 130 2022-09-29 R20 T40 NA NA 51321
#> # ℹ more rows
#> # ℹ 2 more variables: provider_name_en <chr>, provider_concept_class_id <chr>
KANTA LAB DATA
Generate a table in service sector format
For example we create a subset of the main service sector table by querying latest 20 events from subject “FG00000001”:
kanta_subject_1_tbl <- FGconnectionHandler$getTblsandboxToolsSchema$kanta_r12_v1()|>
dplyr::filter(finngenid == "FG00000001") |>
dplyr::arrange(desc(approx_event_datetime))
We can see that this table does not currently have information about the medical codes:
kanta_subject_1_tbl
#> # Source: SQL [?? x 19]
#> # Database: DatabaseConnectorJdbcConnection
#> # Ordered by: desc(approx_event_datetime)
#> finngenid event_age approx_event_datetime test_name test_id omop_concept_id
#> <chr> <dbl> <dttm> <chr> <chr> <dbl>
#> 1 FG00000001 71.7 2022-12-07 12:07:00 pt-gfre 656400 -1
#> 2 FG00000001 71.6 2022-11-04 16:18:00 p-crp 1605013 3020460
#> 3 FG00000001 71.6 2022-10-17 12:00:11 b-trom 3002791 3007461
#> 4 FG00000001 71.6 2022-10-15 07:01:00 e-mchc 1557 3003338
#> 5 FG00000001 71.5 2022-09-22 11:53:00 e-mchc 1557 3003338
#> 6 FG00000001 71.5 2022-09-12 12:03:00 s-ca-ion 9010 3016431
#> 7 FG00000001 71.5 2022-09-07 06:00:00 b-trom 2791 3007461
#> 8 FG00000001 71.4 2022-08-31 17:39:14 u-eryt 11344 3002582
#> 9 FG00000001 71.4 2022-08-26 08:57:00 e-mchc 1557 3003338
#> 10 FG00000001 71.4 2022-08-22 08:32:00 b-eryt 1101341 3026361
#> # ℹ more rows
#> # ℹ 13 more variables: measurement_value <dbl>, measurement_unit <chr>,
#> # measurement_value_harmonized <dbl>, measurement_unit_harmonized <chr>,
#> # test_outcome <chr>, measurement_status <chr>,
#> # reference_range_low_value <dbl>, reference_range_high_value <dbl>,
#> # coding_system_oid <chr>, test_id_source <chr>, test_name_source <chr>,
#> # measurement_value_source <dbl>, measurement_unit_source <chr>
Adding info to OMOP_CONCEPT_ID codes
Function fg_dbplyr_append_concept_info_data
adds a new
column with information about the OMOP_CONCEPT_ID name.
omop_schema <- "atlas-development-270609.etl_sam_r12_5k_omop"
kanta_subject_1_tbl_with_omop_name <- kanta_subject_1_tbl |> fg_dbplyr_append_concept_info_data(omop_schema)
kanta_subject_1_tbl_with_omop_name |>
dplyr::select(finngenid, event_age, approx_event_datetime, test_name, test_id, omop_concept_id, omop_concept_name)
#> # Source: SQL [?? x 7]
#> # Database: DatabaseConnectorJdbcConnection
#> finngenid event_age approx_event_datetime test_name test_id omop_concept_id
#> <chr> <dbl> <dttm> <chr> <chr> <dbl>
#> 1 FG00000001 67.7 2018-12-17 09:35:00 b-baso 3157 3006315
#> 2 FG00000001 68.9 2020-02-24 11:54:00 b-baso 3157 3006315
#> 3 FG00000001 69.2 2020-06-05 09:37:00 b-baso 9227 3006315
#> 4 FG00000001 66.2 2017-05-22 11:16:00 b-baso 3157 3006315
#> 5 FG00000001 70.9 2022-02-14 11:46:00 u-alb-o 1030 3029937
#> 6 FG00000001 67.0 2018-03-22 08:36:00 e-rdw 1349 3002385
#> 7 FG00000001 68.1 2019-04-16 07:40:00 e-rdw 20408 3002385
#> 8 FG00000001 65.9 2017-02-01 08:05:00 e-rdw 20408 3002385
#> 9 FG00000001 70.6 2021-10-11 08:08:00 b-hb-co 1566 3023081
#> 10 FG00000001 68.9 2020-02-29 07:01:00 li-baktvi 1154 -1
#> # ℹ more rows
#> # ℹ 1 more variable: omop_concept_name <chr>