Introduction
The gepabds package provides tools for analyzing gene expression data stored in SummarizedExperiment objects.
This vignette shows how to compute expression statistics for selected genes.
Loading the Data:
# load data
data(example_se)
example_se
#> class: SingleCellExperiment
#> dim: 200 30
#> metadata(0):
#> assays(2): counts logcounts
#> rownames(200): KBTBD4 ZNF423 ... MMAA SLC31A2
#> rowData names(2): symbol chr
#> colnames(30): D31-5_83 D31-4_92 ... D31-1_87 D31-3_64
#> colData names(4): label donor plate sizeFactor
#> reducedDimNames(0):
#> mainExpName: endogenous
#> altExpNames(1): ERCCExploring the data:
# Explore data
# View gene names
rownames(example_se)
#> [1] "KBTBD4" "ZNF423" "CCDC74A" "TBX19"
#> [5] "ORAI3" "NOVA1" "TPSAB1" "ARHGEF35"
#> [9] "OGFR" "TNPO1" "DNM1P35" "PLCB4"
#> [13] "USP25" "POPDC2" "SRGAP2" "CDIPT"
#> [17] "LOC389332" "RSPH4A" "PRMT2" "SCARNA9L"
#> [21] "ALG10" "ZXDA" "ABO" "FAM46A"
#> [25] "MOSPD2" "LRRC37A11P" "WDR83" "DHX34"
#> [29] "SMO" "FRMD6-AS1" "FGF10" "NDUFV1"
#> [33] "AAK1" "RESP18" "SNAP25" "ARL6IP1"
#> [37] "STX1B" "KCNC3" "LAIR1" "RWDD2B"
#> [41] "RTBDN" "SPRYD7" "SMG6" "CAPSL"
#> [45] "ACOT2" "HPCAL1" "TGIF2-C20orf24" "CAPZA1"
#> [49] "CCDC112" "ABHD5" "IFNGR1" "EMC1"
#> [53] "FPGT-TNNI3K" "SCLY" "SMIM6" "SH3BGRL3"
#> [57] "HAUS4" "LINC00693" "FAM171B" "IL1R2"
#> [61] "TSPAN11" "KLF13" "MAGEB5" "FAM189A1"
#> [65] "ZC3H10" "ZNRF2" "NLRX1" "MAN2A2"
#> [69] "SCCPDH" "ZBTB21" "P2RY6" "ZSCAN30"
#> [73] "PIM1" "C1S" "LOC100505679" "PARP10"
#> [77] "CNGB3" "GTF2H5" "PRKCDBP" "CAMKK1"
#> [81] "PANK1" "IRF2BPL" "PHF12" "SLC35C2"
#> [85] "LOC100506421" "GALNT9" "UBXN10" "TNFSF9"
#> [89] "STAG2" "TAOK3" "LOC100130348" "ZNF687"
#> [93] "PPP1R12A" "ZNF350" "LIME1" "LIN7B"
#> [97] "EIF3B" "COX16" "C5orf34" "IFT46"
#> [101] "STK32C" "SLC6A17" "ASZ1" "MORN5"
#> [105] "TIAL1" "TNS4" "CDC16" "MAP6"
#> [109] "DCTN5" "EP300" "CHST3" "NPHP3"
#> [113] "CDK11B" "PGAP1" "CLCA4" "SYNDIG1"
#> [117] "MRPL21" "ATG3" "FKBP1AP1" "CREG1"
#> [121] "DPT" "DDX10" "EEF1E1-MUTED" "RNF111"
#> [125] "CLOCK" "MRPL4" "MAP3K14-AS1" "PIWIL2"
#> [129] "UGT2B15" "C2orf15" "BCAS2" "HSPA7"
#> [133] "CCNA1" "EFNA4" "KCNA2" "RPS10"
#> [137] "KIFC1" "A1BG" "LOC100289473" "RALGPS1"
#> [141] "LOC100506195" "ZNF490" "LOC100505695" "CLTCL1"
#> [145] "PIGO" "NOP16" "ATP6AP2" "C1orf227"
#> [149] "SLC8A2" "FRAS1" "TMEM39B" "HAAO"
#> [153] "LRRC2-AS1" "C6orf62" "DCAF17" "MRPL11"
#> [157] "HNF4A" "GBAP1" "RNF148" "MEIS3P1"
#> [161] "LOC641746" "ZCCHC3" "DLK1" "SERINC1"
#> [165] "TEX21P" "ACOX3" "MSH5-SAPCD1" "OCIAD1"
#> [169] "CA13" "FXR1" "CCT2" "CD276"
#> [173] "ESCO1" "EMID1" "NACC1" "SPSB1"
#> [177] "PGBD3" "DBNL" "MED13L" "ZC3H7B"
#> [181] "LOC79015" "FAM86C2P" "IQCH" "ZNF702P"
#> [185] "FUT10" "TMPRSS11D" "HDLBP" "RHPN1-AS1"
#> [189] "PTPN6" "HOXA10" "CRYM" "FAM228A"
#> [193] "ZDHHC22" "GPRASP2" "GORASP1" "NFX1"
#> [197] "FGR" "BTBD7" "MMAA" "SLC31A2"
# View sample metadata
colData(example_se)
#> DataFrame with 30 rows and 4 columns
#> label donor plate sizeFactor
#> <character> <character> <factor> <numeric>
#> D31-5_83 mesenchymal D31 5 1.484996
#> D31-4_92 beta D31 4 0.760032
#> D31-6_69 acinar D31 6 1.077424
#> D29-7_30 pp D29 7 0.590132
#> D30-8_64 alpha D30 8 1.676675
#> ... ... ... ... ...
#> D30-8_87 endothelial D30 8 1.809580
#> D29-2_47 pp D29 2 0.512505
#> D31-7_47 beta D31 7 0.688036
#> D31-1_87 alpha D31 1 1.196185
#> D31-3_64 acinar D31 3 2.222911Genes of interest
This step selects a small subset of genes from the dataset to demonstrate downstream analysis.
In real analyses, genes of interest are typically selected based on biological relevance or statistical criteria, such as:
high mean expression across samples high variability between conditions known involvement in a pathway or disease process
# genes of interest
genes_to_use <- rownames(example_se)[1:5]
genes_to_use
#> [1] "KBTBD4" "ZNF423" "CCDC74A" "TBX19" "ORAI3"Computing Expression Stats
This function calculates summary statistics for each gene, including: - mean expression across samples - variance (how variable expression is) - optionally filtered results for selected genes
These metrics help prioritize genes that are biologically informative.
# compute expression statistics
result <- compute_expr_stats(example_se, genes = genes_to_use)
result
#> gene cell_type mean_expr median_expr detection_rate n_cells
#> 1 KBTBD4 mesenchymal 0.0000000 0.0000000 0.0000000 2
#> 2 KBTBD4 beta 0.0000000 0.0000000 0.0000000 3
#> 3 KBTBD4 acinar 0.0000000 0.0000000 0.0000000 6
#> 4 KBTBD4 pp 0.0000000 0.0000000 0.0000000 4
#> 5 KBTBD4 alpha 0.0000000 0.0000000 0.0000000 10
#> 6 KBTBD4 endothelial 0.0000000 0.0000000 0.0000000 2
#> 7 KBTBD4 delta 0.0000000 0.0000000 0.0000000 1
#> 8 KBTBD4 duct 0.0000000 0.0000000 0.0000000 2
#> 9 ZNF423 mesenchymal 0.3719604 0.3719604 0.5000000 2
#> 10 ZNF423 beta 0.0000000 0.0000000 0.0000000 3
#> 11 ZNF423 acinar 0.0000000 0.0000000 0.0000000 6
#> 12 ZNF423 pp 0.0000000 0.0000000 0.0000000 4
#> 13 ZNF423 alpha 0.0000000 0.0000000 0.0000000 10
#> 14 ZNF423 endothelial 1.1493168 1.1493168 1.0000000 2
#> 15 ZNF423 delta 0.0000000 0.0000000 0.0000000 1
#> 16 ZNF423 duct 0.0000000 0.0000000 0.0000000 2
#> 17 CCDC74A mesenchymal 0.6169727 0.6169727 0.5000000 2
#> 18 CCDC74A beta 0.0000000 0.0000000 0.0000000 3
#> 19 CCDC74A acinar 0.0000000 0.0000000 0.0000000 6
#> 20 CCDC74A pp 0.3075665 0.0000000 0.2500000 4
#> 21 CCDC74A alpha 0.1232688 0.0000000 0.1000000 10
#> 22 CCDC74A endothelial 0.0000000 0.0000000 0.0000000 2
#> 23 CCDC74A delta 0.0000000 0.0000000 0.0000000 1
#> 24 CCDC74A duct 0.0000000 0.0000000 0.0000000 2
#> 25 TBX19 mesenchymal 0.3133253 0.3133253 0.5000000 2
#> 26 TBX19 beta 0.0000000 0.0000000 0.0000000 3
#> 27 TBX19 acinar 0.0000000 0.0000000 0.0000000 6
#> 28 TBX19 pp 0.0000000 0.0000000 0.0000000 4
#> 29 TBX19 alpha 0.0000000 0.0000000 0.0000000 10
#> 30 TBX19 endothelial 0.0000000 0.0000000 0.0000000 2
#> 31 TBX19 delta 0.0000000 0.0000000 0.0000000 1
#> 32 TBX19 duct 0.5766949 0.5766949 0.5000000 2
#> 33 ORAI3 mesenchymal 0.0000000 0.0000000 0.0000000 2
#> 34 ORAI3 beta 1.0610933 1.2130741 0.6666667 3
#> 35 ORAI3 acinar 0.2475599 0.0000000 0.3333333 6
#> 36 ORAI3 pp 0.3075665 0.0000000 0.2500000 4
#> 37 ORAI3 alpha 0.5745060 0.0000000 0.4000000 10
#> 38 ORAI3 endothelial 0.3178523 0.3178523 0.5000000 2
#> 39 ORAI3 delta 0.0000000 0.0000000 0.0000000 1
#> 40 ORAI3 duct 0.0000000 0.0000000 0.0000000 2Interpreting the results:
Each row corresponds to a gene. Genes with higher mean expression are more abundant across samples, while higher variance suggests condition-specific regulation.
In downstream analyses, we often focus on genes that are both highly expressed and variable, as these are more likely to be biologically meaningful.