All posts by medical

DORGE: Discovery of Oncogenes and tumoR suppressor genes using Genetic and Epigenetic features – Science Advances

INTRODUCTION

Cancer results from an accumulation of key genetic alterations that disrupt the balance between cell division and apoptosis (1). Genes with driver mutations that affect cancer progression are known as cancer driver genes (2), which can be classified as tumor suppressor genes (TSGs) and oncogenes (OGs) based on their roles in cancer progression (3). OGs are usually activated by gain-of-function mutations that stimulate cell growth and division, whereas TSGs are inactivated by loss-of-function (LoF) mutations (frameshift insertions/deletions and nonsense mutations) that block TSG functions in inhibiting cell proliferation, promoting DNA repair, and activating cell cycle checkpoints.

CRISPR-Cas9 screens with libraries of single-guide RNAs are powerful tools for identifying genes essential for cancer cell fitness, such as cancer cell growth and viability. For example, recent CRISPR screens by the Wellcome Sanger Institute detected 628 priority targets in 324 human cell lines from 30 cancer types (4). However, the genes identified by CRISPR screens in cell lines, which differ vastly from primary cells, may not be physiologically relevant to human biology and disease. Many well-known cancer driver genes in the Cancer Gene Census (CGC) database (5) were missing in CRISPR-screening results. They might have phenotypic effects in animal models that are not included in the current CRISPR screens.

Hence, it is necessary to predict cancer driver genes based on patient genomics data. Cancer genome sequencing efforts, such as The Cancer Genome Atlas (TCGA) (6), have generated an unprecedentedly large data resource and enabled the development of bioinformatics algorithms to discover cancer driver genes. Tokheim et al. (7) reviewed eight major algorithms, and Bailey et al. (8) integrated 26 computational tools in a pan-cancer mutation study. These algorithms mainly look for cancer driver genes with greater than expected background mutational rates, and they output a ranked list of candidate genes based on a small collection of genetic features such as somatic mutations and copy number alterations (CNAs). Tumor Suppressor and OG Explorer (TUSON) (9) and the 20/20+ machine-learning method (7) are the two major algorithms that can distinguish between protein-coding TSGs and OGs based on distinct patterns of mutational signatures.

However, a recent meta-analysis indicated that, over the next 10 years, even if all available tumor genomes were analyzed, many cancer driver genes would remain undetected because of the lack of distinction between driver mutations and background mutational load (10). In addition, emerging evidence suggests that genetic alterations alone are insufficient to explain all cancer driver genes, including some well-known ones. For example, sustained expression of estrogen receptor- (ESR1) drives two-thirds of breast cancers, but ESR1 mutations that alter transcription levels occur in only 7% of ESR1-positive tumors (11). Furthermore, many pediatric tumors have extremely low mutation rates; some even appear to have no substantial recurrent somatic mutations (12). Thus, it is likely that other mechanisms, such as epigenetic alterations, are responsible for the dysregulation of many cancer driver genes.

For example, trimethylation on histone H3 lysine 4 (H3K4me3) and DNA methylation are the most extensively studied epigenetic modifications that influence gene expression and cell fate. H3K4me3 is a widely recognized marker of active promoters and regulates the preinitiation complex formation and gene activation (13). More than 80% of promoters containing H3K4me3 are transcribed (14), and H3K4me3 is also involved in pre-mRNA splicing, recombination, DNA repair, and enhancer function. DNA methylation occurs in 70 to 80% of 5CphosphateG3 (CpG) sites in a normal genome (15). H3K4me3 and CpG methylation alteration are associated with disease initiation, including many types of cancer (16). In particular, promoter hypermethylation that silences TSGs is a key epigenetic event in tumorigenesis (17), whereas gene-body methylation is positively correlated with gene expression (18). Recently, the broad epigenetic domain has emerged as a new concept in the control of cancer development. In an integrative analysis of 1134 genome-wide ChIP-seq datasets (19) from the Encyclopedia of DNA elements (ENCODE) project (20), we found that broad H3K4me3 is a unique epigenetic signature of TSGs. In contrast to the common sharp (e.g., <1-kb width) H3K4me3 peaks associated with increased transcriptional initiation, broad H3K4me3 peaks are associated with increased transcriptional elongation. In addition, we also found many wide gene-body regions that are lowly methylated in normal tissues (the regions called gene-body methylation canyons) as hypermethylated in cancer (21). Gene-body methylation canyons are unexpectedly enriched in OGs, and their hypermethylation directly induces OG activation (21).

Nevertheless, to the best of our knowledge, none of the existing bioinformatics algorithms sufficiently leveraged epigenetic features to predict cancer driver genes, despite the fact that epigenetic alterations are known to be associated with cancer driver genes. Therefore, these algorithms were not fully empowered, and there is a pressing need for a computational algorithm that integrates epigenetic data with genetic alterations to improve the prediction of cancer driver genes.

To address this need, we developed DORGE (Discovery of Oncogenes and tumor suppressoR genes using Genetic and Epigenetic features). DORGE includes two prediction algorithms: DORGE-TSG for predicting TSGs and DORGE-OG for predicting OGs; both algorithms are elastic netbased logistic regression (LR) classifiers trained on CGC genes and neutral genes (NGs). By evaluating DORGE-TSG and DORGE-OG, we found an unusually large contribution of histone modification to TSG prediction, as well as crucial roles of the features such as missense mutations, genomics, super enhancer percentages, and hypermethylation in predicting OGs. Cancer driver genes predicted by DORGE include known cancer driver genes and novel ones that have not been reported in the literature. We evaluated these novel cancer driver genes using multiple genomics and functional genomics datasets. In addition, we found that the novel dual-functional genes, which DORGE predicted as both TSGs and OGs, are highly enriched at hubs in protein-protein interaction (PPI) and drug/compound-gene networks.

We developed a computational tool DORGE, by integrating extensive genomic and epigenomic datasets, for predicting cancer driver genes, i.e., TSGs and OGs. Briefly, we used CGC genes and 75 curated candidate features to train two binary classification algorithms: DORGE-TSG and DORGE-OG, which we subsequently applied to every gene to predict its probability of being a TSG and OG, respectively. Last, we used the predicted probabilities to rank genes genome-wide and identified the top-ranked genes as candidate TSGs and OGs.

Prediction of cancer driver genes is a classification problem. It requires a high-quality training dataset that contains reliable TSGs, OGs, and the genes unlikely to be TSGs or OGs. Our two positive-training gene sets include 242 TSGs and 240 OGs (with dual-functional genes removed) from the CGC database v.87, which we refer to as CGC-TSGs and CGC-OGs hereafter. The negative-training gene set includes 4058 NGs reported to have no cancer relevance (9). To allow for the prediction of dual-functional genes that are both a TSG and an OG, we trained two classifiers for predicting TSGs and OGs, respectively.

To develop DORGE, we constructed 75 features that are likely predictive of cancer driver genes based on the literature. These features have either known roles in TSG/OG disruption (e.g., DNA methylation and somatic mutations) or potential links to TSG/OG functions (e.g., CRISPR-screening data; data file S1). We categorized these features into four major types: (i) 33 mutational features from two well-known cancer driver gene prediction algorithmsTUSON (9) and 20/20+ (7)and Genome Aggregation Database (gnomAD); 28 of these 33 features were compiled by TCGA (6) and Catalogue of Somatic Mutations in Cancer (COSMIC) (5) from the mutation data of patient samples; (ii) 12 genomic features including 3 from 20/20+ (7) and 9 features (e.g., gene lengths and genome evolutionrelated features) that have not been previously used to predict cancer driver genes (22); (iii) 27 epigenetic features, including histone modifications from the ENCODE project (20), promoter and gene-body methylation features from the COSMIC database, and super enhancer percentages from the dbSUPER database (23); and (iv) 3 phenotypic features, including CRISPR-screening data from the DepMap project (24), Variant Effect Scoring Tool (VEST) scores from 20/20+ (7), and gene expression Z scores from TCGA.

To train classifiers for TSG and OG prediction, we compared eight classification algorithms: LR, LR with the lasso penalty, LR with the ridge penalty, LR with the elastic net penalty, random forests, support vector machines (SVM) with the linear kernel, SVM with the Gaussian kernel, and XGBoost (https://github.com/dmlc/xgboost). For each algorithm, we considered three class ratios (where a class ratio was defined as the number of NGs to the number of CGC-TSGs or CGC-OGs): the original ratio, 5:1, and 1:1; for the latter two ratios, we randomly divided NGs into partitions so that the number of NGs in each partition approximately met the ratio given the number of CGC-TSGs or CGC-OGs. Considering the imbalance between NGs and CGC-TSGs/CGC-OGs in sizes, we used the fivefold cross-validated (CV) area under the precision-recall curve (AUPRC), instead of the receiver operating characteristic curve, as the accuracy measure to compare these eight classification algorithms under the three class ratios. Our comparison result showed that downsampling the NGs to have more balanced class ratios as 5:1 and 1:1 did not improve the accuracy achieved by the original class ratio. Hence, we decided to keep the original class ratio and found that LR with the lasso, LR with the ridge, LR with the elastic net, and random forests performed the best with similar AUPRC values (data file S2). We chose LR with the elastic net as the classification algorithm for its good interpretability and its capacity for selecting correlated, informative features. Then, we trained LR with the elastic net separately for TSG and OG prediction and subsequently used the two trained algorithms to assign every gene a TSG score and an OG score, both ranging from 0 to 1, with a larger value indicating a higher chance of the corresponding gene being a TSG or an OG. To decide appropriate thresholds on the TSG scores and OG scores for final predictions, we weighted the severity of mispredicting NGs as TSGs/OGs (i.e., making false-positive predictions) versus the other way around and set a target false-positive rate (FPR) of 1%. Last, we used the Neyman-Pearson classification algorithm (25) to set thresholds on the TSG scores and OG scores by respecting our target FPR and obtained two classifiers: DORGE-TSG and DORGE-OG for predicting TSGs and OGs, respectively.

Next, we identified the important features for TSG and OG prediction. Because many features are correlated (data file S1), the feature coefficients estimated by LR with the elastic net are not biologically interpretable measures of feature importance. The reason is that if one adds to the training data a feature that is highly correlated with an existing feature, the estimated coefficient of the existing feature would become less significant. This phenomenon contradicts our biological interpretation of feature importance: If a feature is important, its importance should not be diluted by the addition of another feature. Yet, we are still interested in the importance of features in our final multifeature linear classifier, so marginal feature importance based on each feature alone does not suffice. To address this issue, we proposed a simple two-step procedure: (i) we clustered features into feature groups that were approximately uncorrelated with one another, and (ii) we evaluated the importance of each feature group by the reduction in the fivefold CV AUPRC when that feature group was left out, i.e., the contribution of that feature group to the fivefold CV AUPRC given all the other feature groups. Our simple but innovative approach is advantageous in three aspects. First, by grouping correlated features, we can interpret a small number of feature groups, each of which has a distinct biological interpretation, instead of a large number of features. Second, making feature groups approximately uncorrelated has a desirable consequence: if a new feature were added, it would either be added to an existing feature group or create a new feature group by itself (if it is approximately uncorrelated with any existing features); then, its addition would barely affect the importance of the feature groups it is not in, as uncorrelated features would not affect each others importance in a multifeature linear classifier. Third, the same criterion, fivefold CV AUPRC, was used to select a classification algorithm and define the importance of a feature group, making the analysis self-consistent. Using this approach, we first divided all 75 features into 20 feature groups by hierarchical clustering with complete linkage so that features within each group have pairwise absolute Pearson correlations of at least 0.1 (data file S2). Then, we ranked the 20 feature groups by their contributions to fivefold CV AUPRC and selected the top-ranked groups as those whose contributions exceeded 0.005. This gave us three and five feature groups for predicting TSGs and OGs, respectively.

Analyzing these top predictive feature groups, we found that multiple histone modification features stood out as the most predictive group (whose contribution to 5-fold CV AUPRC was almost 10-fold of that of the second most predictive group containing phenotype features) for TSGs and that missense mutations constituted the top feature group for predicting OGs (Fig. 1, A and B). Besides, epigenetic features including super enhancer and cancernormal methylation differences in promoter and gene-body regions were among the top feature groups for predicting OGs (Fig. 1B). We also found histone modifications and missense mutations among the top predictive features for both TSGs and OGs (Fig. 1, A and B), suggesting that TSGs and OGs share certain features, whose predictive power for TSGs and OGs may be different though. For each feature within a top-ranked TSG (or OG) feature group, we compared its values in the CGC-TSGs (or CGC-OGs) and the NGs by the two-sided Wilcoxon rank-sum test, and the resulting log10P value was shown in Fig. 1 (A and B).

(A) Feature groups selected for TSGs. (B) Feature groups selected for OGs. Feature groups are sorted according to the AUPRC reduction in elastic net fivefold cross-validation. Feature groups are named according to the representative features. Box plots showing the distribution of (C) H3K4me3 mean peak length, (D) VEST score, (E) missense damaging/benign ratio, (F) missense entropy, (G) pLI score, and (H) super enhancer percentage for the CGC-OG, CGC-TSG, and NG sets. Genes as both TSGs and OGs are excluded. P values for the differences between the TSGs/OGs and NGs were calculated by the one-sided greater than Wilcoxon rank-sum test.

We further examined several features in terms of their individual, marginal power of distinguishing CGC-TSGs and CGC-OGs from NGs. Multiple features are marginally strong predictors of TSGs, as they have significantly higher values in CGC-TSGs than in NGs. They include epigenetic features such as H3K4me3 peak length and height (Fig. 1C and fig. S1A) and H3K79me2 peak length and height (fig. S1, B and C), missense mutational features such as nonsilent/silent ratio (fig. S1D), and phenotype features such as VEST score (Fig. 1D). Many features also have significantly higher values in CGC-OGs than in NGs. They include missense mutational features such as missense damaging/benign ratio (Fig. 1E), missense entropy (Fig. 1F), probability of being LoF intolerant (pLI) score (Fig. 1G), and LoF o/e (observed/expected) constraint (fig. S1E); genomics features such as evolutionary conservation phastCons score and noncoding Genomic Evolutionary Rate Profiling (ncGERP) score (fig. S1, F and G); and epigenetic features such as super enhancer percentage in cell lines (Fig. 1H). In particular, our finding agrees with previous studies in that missense damaging/benign ratio (reflecting the functional impact of missense mutations) and missense entropy (representing the enrichment of mutations in few residues) (9) have significantly higher values in CGC-OGs than in CGC-TSGs and NGs (Fig. 1, E and F). VEST and PolyPhen-2 scores, both of which reflect functional effects of mutations, have significantly higher values in CGC-TSGs and CGC-OGs than in NGs, and they do not exhibit statistically significant differences between CGC-TSGs and CGC-OGs (Fig. 1D and fig. S1H). We found super enhancer, a commonly regarded OG-specific feature (26), also characteristic of TSGs, as it has significantly higher values in CGC-TSGs than in NGs (Fig. 1H).

We note that, besides H3K4me3 peak length, a readily known TSG predictor, peak lengths of four more histone marks (H3K79me2, H3K36me3, H4K20me1, and H3K9ac) are also significantly larger in CGC-TSGs than in CGC-OGs and NGs (fig. S1, B, I, J, and K), consistent with the fact that the activation of TSGs is associated with transcriptional elongation (19, 2729). To further verify the enrichment of broad H3K4me3 peaks in CGC-TSGs, we performed the Fishers exact test on a two-by-two contingency table, whose two rows correspond to CGC-TSGs and all the other genes in the training data (CGC-OGs and NGs) and whose two columns correspond to the genes with broad H3K4me3 peaks (whose mean lengths across ENCODE samples are >4 kb) and the rest of genes. We similarly performed two more Fishers exact tests to check the enrichment of broad H3K4me3 peaks in CGC-OGs and NGs but found much lower enrichment in these two gene groups than in CGC-TSGs, confirming that H3K4me3 is a distinctive feature of TSGs (fig. S1L). Together, we identified histone modifications as the top predictors for TSGs. We found missense mutations, super enhancer percentages, and methylation differences between cancer and normal samples as major predictors for OGs. It is worth noting that histone modifications and missense mutations are also important features for predicting OGs and TSGs, respectively, although to a lesser extent. In summary, DORGE can successfully leverage public data to discover the genetic and epigenetic alterations that play significant roles in cancer driver gene dysregulation. Figure S2 provides an overview of the DORGE method and the evaluations in the following sections.

As we described earlier, DORGE-TSG and DORGE-OG output TSG scores and OG scores for predicting TSGs and OGs, respectively. Every gene received a TSG score and an OG score, both ranging from 0 to 1, and a higher TSG score (or OG score) indicates a higher probability of a gene being a TSG (or an OG; Materials and Methods). DORGE thresholded the TSG scores and OG scores by the Neyman-Pearson classification algorithm (25) with a target FPR of 1%, leading to 925 predicted TSGs, whose TSG scores exceeded 0.6233374, and 683 predicted OGs, whose OG scores exceeded 0.6761319. In total, DORGE predicted 1172 cancer driver genes, including 436 dual-functional genes (Fig. 2A; the predicted genes are listed in data file S2). We note that these predicted TSGs and OGs are conservative predictions guided by the small FPR threshold 1%, as reflected by the fact that their numbers are smaller than the numbers of previously predicted cancer driver genes1217 TSGs and 803 OGs in databases TSGene (30) and ONGene (31) (by 18 June 2020). If DORGE users would like to be less conservative and predict more TSGs and OGs, they can opt for a higher FPR threshold such as 5%. Next, we filtered out CGC genes from the DORGE-predicted cancer driver genes and defined the remaining 725 predicted TSGs and 515 predicted OGs as DORGE-predicted novel genes (data file S1), among which 537 novel TSGs were not included in the CancerMine (32) or TSGene database (Fig. 2B), and 306 novel OGs were not found in the CancerMine or ONGene database (Fig. 2C).

Venn diagrams showing the overlap (A) between DORGE-predicted novel TSGs/OGs and CGC-TSGs/OGs; (B) between DORGE-predicted novel TSGs, CGC-TSGs, CancerMine-TSGs, and TSGene database-TSGs; and (C) between DORGE-predicted novel OGs, CGC-OGs, CancerMine-OGs, and ONGene database-OGs. Precision-recall curves (PRCs) for (D) TSG and (E) OG prediction. Different lines represent different PRCs from DORGE or DORGE variants. (F) Stacked bar plots showing the number of rediscovered CGC-TSGs and CGC-OGs using all features compared to CRISPR-screening data only. Cumulative distribution function (CDF) plots of DORGE-predicted TSG scores (G) and OG scores (H) of 19,636 human genes. The x axis and the y axis are swapped for illustration purposes, and the y axis is stretched to emphasize large TSG and OG scores. CGC genes are plotted as jitter points to avoid overplotting. The dashed lines indicate DORGE-TSG and DORGE-OG thresholds at a target FPR of 1%, and the CGC genes whose TSG scores and OG scores exceed the thresholds (above the dashed lines) are predicted as TSGs and OGs. (I) Top 15 DORGE-predicted non-CGC novel TSGs (left) and OGs (right), respectively, along with representative feature heatmaps and PubMed IDs. To make features comparable, feature values are transformed into quantiles. (J) Top 15 DORGE-predicted non-CGC novel TSGs (left) and OGs (right) that have no documented role in cancer based on the TSGene, ONGene, and CancerMine databases, along with representative feature heatmaps.

We evaluated DORGE-TSG and DORGE-OG by their overall prediction accuracy and found that they achieved high fivefold CV AUPRC of 0.821 and 0.766, respectively, when trained with all the 75 features (Fig. 2, D and E). Considering that previous algorithms primarily relied on genetic features to predict cancer driver genes, we evaluated the accuracy gain of DORGE from including epigenetic and phenotypic features. To this end, we constructed variants of DORGE-TSG and DORGE-OG based on each of the following feature subsets: Mutation, Genomics, Phenotype, Epigenetics, and their complements (i.e., the subsets resulting from subtracting each of the four feature subsets from the 75 features), as well as TUSON and CRISPR screeningonly features (Fig. 2, D and E, and data file S1). For each of these DORGE-TSG and DORGE-OG variants, we calculated its fivefold CV AUPRC.

On the basis of feature subsets Mutation, Genomics, Phenotype, and Epigenetics, the corresponding DORGE-TSG variants achieved fivefold CV AUPRC of 0.638, 0.314, 0.358, and 0.600, respectively. In parallel, on the basis of the complements of Mutation, Genomics, Phenotype, and Epigenetics (i.e., when features in each subset were excluded), the corresponding DORGE-TSG variants achieved fivefold CV AUPRC of 0.692, 0.819, 0.820, or 0.715. These results consistently show the large contributions of Mutation and Epigenetics features to TSG prediction (Fig. 2D). Furthermore, using the TUSON methods features and the CRISPR screeningonly feature, the corresponding DORGE-TSG variants only achieved fivefold CV AUPRC of 0.500 and 0.156, much lower than 0.821 achieved by DORGE-TSG with all the 75 features. Similarly, we compared DORGE-OG with its variants trained on feature subsets. Specifically, DORGE-OG variants that only used Mutation, Genomics, Phenotype, or Epigenetics features achieved fivefold CV AUPRC of 0.660, 0.299, 0.241, or 0.295, respectively; when each of these feature subsets was excluded, the AUPRC correspondingly became 0.453, 0.752, 0.763, or 0.705. These results suggest that Mutation features have a large contribution to OG prediction (Fig. 2E). Similar to DORGE-TSG, the DORGE-OG variants trained with TUSON features or the CRISPR screeningonly feature had much lower prediction accuracy (fivefold CV AUPRC of 0.534 or 0.089) than that of DORGE-OG trained with all the 75 features (fivefold CV AUPRC of 0.766). The fact that DORGE-TSG and DORGE-OG outperformed all their variants confirms that DORGE effectively leveraged the 75 features and did not suffer from overfitting in its TSG and OG prediction.

The above results also reveal that the CRISPR screeningonly feature did not have a high predictive power on its own, as shown by its low fivefold CV AUPRC (0.156 and 0.089) in TSG and OG prediction. Moreover, under the target FPR of 1%, the DORGE-TSG and DORGE-OG variants with the CRISPR screeningonly feature identified only 16 (5.1%) CGC-TSGs and 3 (1.0%) CGC-OGs, whereas DORGE-TSG and DORGE-OG with all the 75 features recovered an additional 184 (58.8%) CGC-TSGs and 165 (53.1%) CGC-OGs (Fig. 2F). These results challenge a common belief that CRISPR screening using cell lines is powerful for discovering cancer driver genes. A possible reason for our results is that cell lines do not mimic in vivo cancer cells well. These additional cancer driver genes with all the 75 features might have phenotypic effects in animal models that have not been included in the current CRISPR screens.

We next evaluated the distinct predictive power provided by epigenetic features to cancer driver gene prediction. Inspecting the distributions of TSG scores and OG scores, we found that many top-ranked CGC genes were not predictable by DORGE without epigenetic features (Fig. 2, G and H). In detail, 52 (16.61%) CGC-TSGs and 26 (8.36%) CGC-OGs would have been missed by DORGE-TSG and DORGE-OG, respectively, at the target FPR of 1% if epigenetic features were not included. These results suggest that (i) epigenetic features empowered the discovery of cancer driver genes and (ii) epigenetic features empowered DORGE-TSG more than DORGE-OG because the number of rescued CGC-TSGs (52) is twice the number of rescued CGC-OGs (26).

We then searched biomedical literature for the top 15 novel TSGs and OGs ranked by DORGE. Out of these top novel genes, 10 TSGs and 12 OGs have reported tumor-suppressive and oncogenic functions, respectively (Fig. 2I). We also inspected these top novel genes for selected representative features and confirmed that they have high values in the top predictive TSG features (H3K4me3 peak length, nonsilent/silent ratio, VEST score, and conservation phastCons score) and OG features (missense entropy, super enhancer percentage, pLI score, ncGERP score, and gene-body cancernormal methylation difference) selected from the top feature groups (Fig. 2I). We further confirmed this result in the subset of top novel genes that are not in the CancerMine, TSGene, and ONGene databases (Fig. 2J). In particular, nearly all of the top novel TSGs have broad H3K4me3 peaks, and most of the top novel OGs are hypermethylated in gene body (with positive cancernormal methylation differences).

We further compared DORGE with 10 existing algorithms for cancer driver gene prediction using four accuracy measuressensitivity (Sn), specificity (Sp), precision, and overall accuracyall based on CGC genes (Table 1). We did not include the five-test model (RF5) because although it outputs TSG and OG probabilities, it does not have explicit cutoffs for defining TSGs and OGs (33). We found that DORGE performed the best in all these measures except Sp, for which DORGE was 0.997 and the best algorithm 20/20+ was 1.000. The superiority of DORGE was most obvious in Sn, where its top performance (0.611) was followed with a large gap by OncodriveFM (0.338) (34), MuSIC (0.331) (35), and MutPanning (0.318) (36) (Table 1). To further confirm that DORGE outperformed these 10 algorithms, we performed a similar comparison based on 1056 OncoKB cancer genes (37), which had been widely used to benchmark cancer gene prediction. Consistent with the CGC gene evaluation results, DORGE achieved the best performance in Sn (almost 50% higher than that of the second best algorithm OncodriveFM) and overall accuracy, the third best performance in Sp (0.997 versus 0.999 of the best method TUSON), and the second best performance in precision (0.973 versus 0.993 of the best method 20/20+, whose Sn was only 32% of that of DORGE) (data file S2). Together, our benchmark results show that DORGE made a significant advance in improving cancer driver gene prediction from existing algorithms.

On the basis of CGC-TSGs and CGC-OGs, we further benchmarked DORGE against 20/20+, TUSON, and Genes Under Selection in Tumors (GUST) for separate prediction of TSGs and OGs (data file S2). We did not include the other seven algorithms because they could not predict TSGs and OGs separately. Consistent with our previous results, DORGE exhibited much higher Sn than the other three algorithms (DORGE had Sn of 0.639 and 0.54 for predicting TSGs and OGs, while the best Sn of the other three algorithms was only 0.252 and 0.116), and it also achieved the best precision and overall accuracy; all the four algorithms had close to perfect Sp. Although the high Sn of DORGE seemed to be due to the fact that 20/20+, TUSON, and GUST by default predicted fewer TSGs and OGs than DORGE did, it was not the case. After we adjusted the thresholds of 20/20+ and TUSON so that they predicted the same numbers of TSGs and OGs as DORGE did (the GUST software does not allow such threshold adjustment), the Sn of 20/20+ and TUSON, though increased, remained almost onefold lower than that of DORGE. Collectively, our results suggest that DORGE outperformed 20/20+, TUSON, and GUST in both TSG and OG prediction.

We also compared DORGE with TUSON and 20/20+ in terms of their predicted ranking of CGC-TSGs and CGC-OGs. For example, if an algorithm predicted gene A more likely than gene B to be a TSG, we say that gene A received a higher TSG rank (smaller in rank number) than gene B did. Accordingly, we calculated a TSG rank and an OG rank for every CGC gene by each algorithm. Among the CGC genes, we define the core CGC-TSGs and core CGC-OGs as those that were annotated solely as TSGs and OGs, not both (dual-functional), in CGC v.77. Compared to the genes that were added later to CGC v.87, these core CGCs have been more extensively studied. Then, we examined the ranking consistency between DORGE and the other two algorithms for CGC genes and the core CGC genes. For CGC-TSGs, we found that their TSG ranks by DORGE had strong positive correlations with their TSG ranks by TUSON and 20/20+ (fig. S3, A and B), and overall, they were ranked higher by DORGE than by the other two algorithms (fig. S3E). We observed similar results for CGC-OGs (fig. S3, C, D, and G). The conclusions also held for core CGC genes (fig. S3, F and H). These results confirm that DORGE predictions are more biologically relevant than those of TUSON and 20/20+. For example, ELL (elongation factor for RNA polymerase II), a CGC-TSG, was ranked 190th by DORGE-TSG, 8144th by TUSON, and 3958th by 20/20+; PDGFB (platelet-derived growth factor subunit B), a CGC-OG, was ranked 207th by DORGE, 2753rd by TUSON, and 4982nd by 20/20+. Also, DORGE ranked CGC dual-functional genes better than TUSON and 20/20+ did, as exemplified by the dual-functional gene IDH1 {isocitrate dehydrogenase [nicotinamide adenine dinucleotide phosphate (NADP+)] 1}, which was ranked first for TSG and 28th for OG by DORGE, 18,734th for TSG and 2092nd for OG by TUSON, and 14,936th for TSG and 13th for OG by 20/20+.

Even though DORGE predicted many more cancer driver genes than TUSON, 20/20+, and GUST didDORGE, TUSON, 20/20+, and GUST predicted 1172, 243, 193, and 276 cancer driver genes, respectivelyDORGE achieved the highest overall prediction accuracy based on CGC genes. After confirming this, we further characterized the novel cancer driver genes, defined as those predicted by DORGE but not included in the CGC database.

We performed the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis on the novel TSGs and OGs, and we found, as expected, that the novel TSGs are enriched with TSG-related pathways such as apoptosis and focal adhesion and that the novel OGs are enriched with OG-related pathways such as cell cycle and transforming growth factor (TGF-) signaling pathway (Fig. 3A). However, without epigenetic features, the novel TSGs and OGs predicted by the DORGE-TSG and DORGE-OG variants are no longer enriched with certain TSG-related and OG-related pathways such as TGF- signaling pathway (fig. S4A). These results again suggest that epigenetic features made unique contributions to discovering novel cancer driver genes. In addition, the degrees of enrichment (log10 P values) of those shared enriched KEGG pathways, which were enriched in novel TSGs or OGs regardless of the inclusion of epigenetic features, are positively correlated, implying that the addition of epigenetic features did not prohibit the discovery of meaningful cancer driver genes (fig. S4, B and C).

(A) KEGG pathway enrichment analysis performed by Enrichr (75) for DORGE-predicted novel TSGs and OGs. Because of space limitations, terms with adjusted P values <104 are shown. Besides, terms with adjusted P values 108-fold lower for TSGs than OGs or 104-fold lower for OGs than TSGs are also shown. (B) ATAC-seq peak score measuring open chromatin for CGC-TSGs/OGs, DORGE-predicted novel TSGs/OGs, and NGs. Enrichment heatmaps of various gene types in (C) ER gene list and (D) inactivating pattern gene list for SB insertional mutagenesis, a screening tool for cancer driver genes. (E) Boxplot showing the Cox hazard ratio (HR) score for various gene types. Data are from rectum adenocarcinoma (READ). (F) Boxplot showing the phyloP score for various gene types. The phyloP score measures phylogenetic conservation and represents logP values under a null hypothesis of neutral evolution. PhyloP basewise conservation scores were derived from a Multiz alignment of 46 vertebrate species. (G) TSGs and OGs are enriched in genes having earlier evolutionary origin (Eukaryota). P values for the differences between indicated gene categories were calculated by the one-sided Wilcoxon rank-sum test. In boxplots and heatmap, the Fishers exact test is used to calculate P values, and gene numbers in different gene categories are normalized to 200 to make P values comparable. In this figure, dual-functional CGC genes were excluded from the CGC-TSGs/OGs.

Given that histone modification features (e.g., H3K4me3 peak length) empowered DORGE-TSG prediction, we sought experimental evidence for the novel TSGs that have broad histone modification (e.g., H3K4me3) peaks. A previous cell proliferation experiment observed increased cell growth after knocking down multiple potential TSGs whose H3K4me3 peaks have mean lengths (across ENCODE cell lines) greater than 2 kb (19), including two DORGE-predicted novel TSGs, CSRNP1 and NR3C1B. Another previous study found that Mll4 loss down-regulates potential TSG expression and weakens broad H3K4me3 peaks in mice (38). Examining the human orthologs of the six mouse potential TSGs that were down-regulated by Mll4 loss in that study, we found that four orthologs were ranked top by DORGE-TSG and have H3K4me3 peaks longer than 2 kb. These four human genes are DNMT3A (18th), BCL6 (96th), FOXO3 (222nd), and CBFA2T3 (1012th).

We first used a published Transposase Accessible Chromatin with high-throughput sequencing (ATAC-seq) dataset of TCGA pan-cancer samples (39) to characterize the DORGE-predicted novel cancer driver genes. ATAC-seq reveals gene accessibility and provides valuable information about the complex gene regulatory relationships. On the basis of this ATAC-seq dataset, we found that DORGE-predicted novel TSGs and OGs, consistent with CGC-TSGs and CGC-OGs, are significantly more accessible than NGs (all with P = 2.22 1016 by the one-sided Wilcoxon rank-sum test; Fig. 3B). This result established a connection between cancer driver genes and chromatin accessibility; both TSGs and OGs are ubiquitously accessible in cancer samples.

We then explored a possible relationship between cancer driver genes and epigenetic regulators (ERs), which are known to play fundamental roles in genome-wide gene regulation by reading or modifying chromatin states. A previous study suggested that most ERs are intolerant to LoF mutations (40), and our fig. S1E also shows that LoF mutations (reflected by the LoF o/e constraint feature) are significantly more abundant in TSGs and OGs than NGs, prompting us to explore whether ER genes have a significant overlap with cancer driver genes. By analyzing a curated list of 761 ERs, we found significant enrichment of CGC-TSGs and CGC-OGs (P = 3.14 1020 and P = 9.36 108, respectively, by the Fishers exact test; in total, 94 CGC cancer driver genes are among the ERs, with P = 2.79 1013 by the Fishers exact test; Fig. 3C). This result also shows the greater enrichment of CGC-TSGs than that of CGC-OGs in ER genes, consistent with a previous study showing that the application of cancer gene classifiers to ER genes revealed more TSGs than OGs (41). Similar to CGC genes, DORGE-predicted novel TSGs (P = 1.15 106) are also more enriched than novel OGs (P = 2.65 103) in ER genes (Fig. 3C).

We next evaluated DORGE-predicted novel TSGs using Sleeping Beauty (SB) screening data. The SB transposon is a type of synthetic DNA element that can disrupt the expression of genes near its insertion sites, a process called insertional mutagenesis. Hence, the SB transposon is a screening tool for TSGs, whose expression disruption leads to carcinogenesis. To verify the novel TSGs, we downloaded the list of inactivating pattern genes from the SB Cancer Driver Database (SBCDDB) (42). As expected, we found that both CGC-TSGs (P = 5.41 1019) and DORGE-predicted novel TSGs (P = 5.11 1024) are enriched in the list. In contrast, NGs have no enrichment. This result is consistent with our expectation that TSGs are inactivated in SB screens (Fig. 3D).

We further evaluated DORGE-predicted novel cancer driver genes using an shRNA screening dataset from the Achilles project (43), as shRNA screens for gene essentiality for cell proliferation in cell lines. On the basis of the dataset, the knockdown of DORGE-predicted novel OGs and CGC-OGs shows a greater decrease in cell proliferation rates compared to NGs (fig. S4D). In contrast, the knockdown of DORGE-predicted novel TSGs and CGC-TSGs shows nearly no decrease in cell proliferation rates compared to NGs (fig. S4D). This result is consistent with the prior knowledge that the proliferation of cell lines is dependent upon OGs (24).

Last, we evaluated DORGE-predicted novel cancer driver genes using patient survival data. In the precomputed survival data downloaded from the OncoRank website (44), every gene has a hazard ratio (HR; whose value >, =, or <1 indicates that the genes expression reduces, does not affect, or increases patients survival time, respectively). We found that CGC-TSGs and DORGE-predicted novel TSGs have significantly lower HRs than OGs (CGC-OGs and DORGE-predicted novel OGs) and NGs in three representative cancer types: rectum adenocarcinoma, colon adenocarcinoma, and uterine corpus endometrial carcinoma (Fig. 3E and fig. S4, E and F). These results are consistent with the fact that TSG expression prohibits cancer occurrence and prolongs survival, while OG expression has the opposite effects. The complete HRs and P values of DORGE-predicted novel TSGs and OGs in 21 cancer types are available in data file S1.

Previous studies have suggested that evolutionarily conserved genes are enriched with cancer driver candidates and drug targets (45). Consistent with these studies, we observed statistically significant differences in exonic sequence conservation (phastCons and phyloP scores) between CGC-TSGs/OGs and NGs, and the same conclusion holds for DORGE-predicted TSGs and OGs (Fig. 3F and fig. S4G). Compared to OGs, TSGs have slightly higher exonic sequence conservation (Fig. 3F and fig. S4G).

We next explored the conservation of noncoding regions in cancer driver genes. Noncoding regions are characterized by positive ncGERP values and negative noncoding Residual Variation Intolerance Score (ncRVIS) values. The reason is that ncGERP is a measure of nucleotide constraints and reflects conservation across the mammalian lineage (fig. S1G) (46), while ncRVIS measures human-specific constraints (46). On the basis of these two measures, we found that TSGs (CGC-TSGs and DORGE-predicted novel TSGs) are slightly more conserved than OGs (CGC-OGs and DORGE-predicted novel OGs) at noncoding regions (figs. S1G and S4H).

In summary, we found that cancer driver genes are more conserved than NGs at both exonic and noncoding regions. Between TSGs and OGs, we found that TSGs are more conserved at exons, while OGs are more conserved at non-coding regions.

Motivated by our conservation results, we investigated the phyletic ages (i.e., evolutionary origins) of cancer driver genes. Although cancer driver genes are believed to have originated from Metazoa (multicellular animals) (47), the possibility of their origination from Eukaryota, an earlier evolutionary origin, has not been explicitly investigated. On the basis of the phyletic-age gene lists (from early to late: Eukaryota, Metazoa, Chordata, and Mammalia) from the Online GEne Essentiality (OGEE) database (48), we found significant enrichment of cancer driver genes in the Eukaryota gene list (Fig. 3G; P values by the Fishers exact test: P = 1.05 103 for CGC-TSGs, P = 3.25 1013 for DORGE-predicted novel TSGs, P = 1.41 105 for CGC-OGs, and P = 2.77 105 for DORGE-predicted novel OGs), in contrast to NGs. Our results indicate that cancer driver genes may have originated earlier in the evolutionary history than previously thought. In addition, we found that cancer driver genes were not enriched in young phyletic ages (Chordata and Mammalia; Fig. 3G), consistent with a recent paper (49).

Previous studies have shown high interactivity of cancer driver genes in the BioGRID PPI network (9), and accordingly, PPI data have been used to identify cancer driver genes (50, 51). We, therefore, explored the extent to which DORGE-predicted TSGs and OGs are connected to other genes/proteins. When analyzing the whole BioGRID PPI network (Fig. 4A), we found that TSGs and OGs, including CGC genes and DORGE-predicted novel genes, exhibit significantly higher degrees, betweenness, and closeness centrality than NGs do (fig. S5, A to C). This result suggested that the removal or knockdown of cancer driver genes, as expected, will exert a critical impact on the whole PPI network, in particular, dual-functional driver genes, as both TSGs and OGs, display even higher interactivity than sole TSGs and OGs do (fig. S5, A to C). Densely connected genes tend to form modules, and cancer driver gene modules can trigger the hallmarks of cancer and confer the proliferation advantages displayed on cancer cells (52). Here, we used the Molecular Complex Detection (MCODE) algorithm to identify six densely connected network modules/backbones (Fig. 4B) from the PPI subnetwork of the 1172 DORGE-predicted cancer driver genes. The 64 genes that comprise the six identified modules are all dual-functional genes (8 CGC dual-functional genes and 56 DORGE-predicted novel dual-functional genes). This overrepresentation of dual-functional driver genes in network modules is unusual, as it is highly unlikely to obtain a 64-gene subnetwork composed of all dual-functional genes (P = 6.66 1027 by the binomial test).

(A) Complete BioGRID PPI network. (B) The Molecular Complex Detection (MCODE) algorithm was applied to DORGE-predicted novel TSGs/OGs to identify densely connected network modules (or backbones). All genes in the identified network are CGC dual-functional genes or novel dual-functional genes. Gene categories are represented as pie charts, with the colors coded based on gene categories. (C) Enrichment of CGC-TSGs/OGs and DORGE-predicted novel TSGs/OGs in hub genes in the BioGRID network. (D) Enrichment of various gene sets or epigenetic and mutational patterns in hub genes in the BioGRID network. (E) Enrichment of CGC-TSGs/OGs and DORGE-predicted novel TSGs/OGs in hub genes in the PharmacoDB gene-drug network. (F) Enrichment of various gene sets or epigenetic and mutational features in hub genes in the PharmacoDB gene-drug network. Hub genes are defined as the genes with the top 5% highest degree in the BioGRID or PharmacoDB network. To generate comparable P values, the gene numbers in different gene categories were normalized to 200. Broad H3K4me3: Genes with H3K4me3 length > 4000. P values for the differences between indicated gene categories were calculated by the right-sided Wilcoxon rank-sum test.

It was previously shown that somatic alterations often occur at PPI network hub genes in cancer (53), and these hub genes are typically essential genes. We therefore investigated the enrichment of cancer driver genes in the hub genes, the 978 genes (top 5%) with the highest degrees in the BioGRID PPI network. We found that all TSGs, OGs, and dual-functional genes (including CGC genes and DORGE-predicted novel genes) are enriched in the hub genes (Fig. 4C). The CGC and novel dual-functional genes are the most enriched (Fig. 4C). We also analyzed the enrichment of 10 functional gene sets. Among these gene sets, we found that the genes with high missense o/e constraints (highest top 5%), the essential genes from the OGEE database, and the ER genes are most enriched in the hub genes (Fig. 4D). Previous literature has not reported any connection between ERs and PPI hub genes, and our finding strengthens the critical roles of ERs. We also found that the genes with broad H3K4me3 peaks are significantly enriched, to a similar degree to the housekeeping genes (HKGs), in the hub genes (Fig. 4D).

Cancer driver gene prediction is the basis for the development of anticancer drugs and personalized cancer treatments. We therefore explored possible gene-drug relationships of DORGE-predicted cancer driver genes using the PharmacoDB, a gene-drug network constructed from comprehensive high-throughput cancer pharmacogenomic datasets. In the subnetwork containing CGC genes and DORGE-predicted novel genes, we found that these cancer driver genes are densely connected to anticancer drugs (fig. S5D). Similar to our observation from the PPI network, we found that TSGs and OGs, including CGC genes and DORGE-predicted novel genes, exhibit significantly denser connections to drugs than NGs do (fig. S5E).

We then identified the top 10 drugs with the largest numbers of connected genes in the PharmacoDB gene-drug network. Among these 10 drugs, the top one is doxorubicin, a well-known chemotherapeutic agent, and the other nine drugs are also known anticancer drugs (fig. S5F). We next identified 979 genes (top 5%) with the highest degrees in the gene-drug network as hub genes and found that DORGE-predicted novel driver genes are enriched in these hub genes (Fig. 4E). We also analyzed the enrichment of 10 functional gene sets in these hub genes. Unlike their enrichment in our previously defined PPI network hub genes (Fig. 4D), the essential genes and the HKGs are not enriched in these gene-drug network hub genes (Fig. 4F), an expected result as their expression is required for normal cells and they are unlikely to be viable drug targets for cancer treatment. In contrast, we still observed the enrichment of three functional gene setsthe genes with high missense o/e constraints (highest top 5%), the ER genes, and the genes with broad H3K4me3 peaksin the gene-drug network hub genes (Fig. 4F). Together with our PPI analysis, we conclude that the genes in these three functional gene sets may be potential actionable drug targets. To the best of our knowledge, there has been no report on the enrichment of the ER genes in gene-drug network hub genes. Our results from PPI and gene-drug network analysis emphasize the importance of studying the ER genes as potential drug targets.

A bottleneck in novel anticancer drug discovery is an efficient selection of potential molecular targets for a drug/compound or its derivatives. Ideal anticancer drugs are those that up-regulate TSGs and/or down-regulate OGs. We used the CRowd Extracted Expression of Differential Signatures (CREEDS) data (54) to explore the relationship between CGC and DORGE-predicted genes and anticancer drugs (data file S1). We identified 68 proven or potential anticancer drugs/compounds that were associated with 68 target genes meeting the filtering criteria (limma Q value < 0.05 and fold change > 2) from the CREEDS data (fig. S6). Fifty-four (79.41%) of the 68 genes are DORGE-predicted novel TSG or OG genes.

Recent pharmacological efforts suggested that drugs/compounds actionable toward more than one gene or molecular pathway are preferable for repurposing (55), and it is common for existing drugs to be later repurposed as anticancer drugs. For example, dexamethasone was previously classified as a corticosteroid but later repurposed for cancer treatment. Among the 68 drugs/compounds we identified, 30 are anticancer and chemotherapy drugs (fig. S6, bottom), 23 have only been tested in laboratories and are not yet in clinical trials, and 15 have not been tested in cell lines (fig. S6, bottom). Of the 38 drugs/compounds not yet confirmed in anticancer clinical trials, many have been proven to treat other diseases. Overall, our results indicate that they are potential drugs for cancer treatment.

In this paper, we developed a machine-learning tool DORGE for identifying cancer driver genes by integrating genetic and epigenetic features. Our development is the first effort that goes beyond the use of tumor genetic alterations for cancer driver prediction, and it was motivated by our previous studies that found specific epigenetic patterns associated with TSGs or OGs (19, 21). Although experimental validation is needed for further studies, our computational evaluation verifies that the novel cancer driver genes predicted by DORGE resemble known cancer drivers in multiple aspects and show promise as potential therapeutic targets. In particular, the top-ranked novel cancer driver genes, especially those regulated by epigenetic mechanisms, warrant further detailed investigation.

Cancer driver genes that are infrequently mutated in cancer are often indistinguishable from passenger genes with random mutations in genome sequencing data. Such random mutations may result from technical reasons including tumor DNA contamination, sequencing depth, and mutation calling failure (56). Therefore, infrequently mutated cancer driver genes are hardly detectable by the methods based on the mutational background model [MutSigCV (57)] or the functional impact model [OncodriveFML (58), OncodriveFM (34), and OncodriveCLUST (59)]. However, these genes may be identified through the integration of epigenetic, phenotypic, and genomic data.

In previous studies, various nonmutational datasets have been used in cancer driver gene identification; however, unlike DORGE, existing work only used few or several nonmutational features extracted from these datasets (7, 50, 51, 57, 60, 61). For example, MutSigCV used DNA replication timing and cell line expression data (57); ActiveDriver used phosphorylation site information (61); and 20/20+ used multispecies conservation, mutation pathogenicity scores, and replication timing (7). PPI networks and pathway knowledge have also been used to identify cancer driver genes (50, 51); however, these studies were biased toward well-studied genes/pathways and thus may overlook quite many genuine cancer driver genes. In contrast to all these studies, DORGE leverages epigenetic information without any bias toward gene selection to predict cancer driver genes, and this innovation results in DORGE overpowering these existing studies in discovering novel cancer driver genes.

We further note that the capacity of DORGE in predicting TSGs and OGs separately allows DORGE to identify novel dual-functional cancer driver genes. This is advantageous given that more and more dual-functional cancer driver genes have been identified in the literature. In this study, we found a unique property of dual-functional cancer driver genes: They have more connecting partners in PPI and drug-gene networks than other driver genes have. This property, to our knowledge, was not previously reported. Several novel dual-functional genes predicted by DORGE drew our attention. For example, PTEN (phosphatase and tensin homolog), a protein phosphatase, is commonly regarded as a TSG; however, DORGE predicted it as an OG as well. We found that oncogenic roles were reported for PTEN in a few studies. One explanation for the dual-functional roles of PTEN is that its oncogenic effect depends on the positions of mutations (62). We confirmed this by analyzing the mutation patterns of PTEN and found one pattern as the classic OG mutation pattern with most substitutions in p.R130 (63). In DORGE, further updates can quantify the dual-functional roles (i.e., the relative chance of being TSGs or OGs) of dual-functional genes.

While we have already found dozens of nonmutational features that contribute significantly to the predictive power of DORGE, many CGC genes remain undetected by DORGE (Fig. 2, G and H). A possible reason is the missingness of other factors or mechanisms that regulate cancer driver genes. Fortunately, the continuing increase in functional genetic and epigenetic data will provide a lasting opportunity to improve and fine-tune cancer driver gene prediction methods. In future studies, we can perform lineage-specific rather than pan-cancer prediction and extend DORGE to predicting long noncoding genes, as many features used in DORGE are not restricted to protein-coding genes. In addition, further work is needed for a better understanding of phenomena such as ancient phyletic ages of cancer driver genes and enrichment of cancer driver genes at PPI and gene-drug network hubs.

In summary, this study highlights the integration of epigenetic data to achieve a more comprehensive prediction of cancer driver genes. DORGE will serve as an essential resource for cancer biology, particularly in the development of targeted therapeutics and personalized medicine for cancer treatment.

In this paper, we propose DORGE, a machine-learning framework incorporating a large number of features to discover TSGs/OGs (fig. S2). First, we used CGC v.87 genes and NGs as the training genes to predict TSGs and OGs separately from 75 candidate features by LR with the elastic net penalty, and the resulting two classifiers are DORGE-TSG and DORGE-OG. Next, we used fivefold cross-validation to evaluate DORGE. We also analyzed the benefit of introducing epigenetic features based on KEGG enrichment and evaluated DORGE based on several genomic and functional genomic datasets. Last, we showed the enrichment of dual-functional genes predicted by DORGE in hub genes in PPI and gene-compound networks.

All gene annotations, genomic, and functional genomic datasets were downloaded from hg19 genome version or processed to hg19 if they were from other genome versions. Genome version conversion was done using the LiftOver program (https://genome.ucsc.edu/cgi-bin/hgLiftOver). HUGO Gene Nomenclature Committee annotation (www.genenames.org/) was used for gene annotation. The gene annotation can be found in data file S1. Promoters were defined as the regions from the upstream 1000 base pairs (bp) to downstream 500 bp of transcription start sites (TSSs), while gene-body regions were defined as the regions from downstream 500 bp of TSSs to transcription termination sites (TTSs).

Somatic mutation datasets. The somatic mutation dataset used in this study was derived from the TCGA (6) website (https://portal.gdc.cancer.gov/) and COSMIC, v86 (5). These two datasets were combined to help increase the mutational information of infrequently mutated genes. Duplicate tumor samples present in more than one dataset were excluded. The final dataset used for the calculation of mutation-related features contained 5,700,484 mutations from more than 30 tumor types. Hypermutated tumor samples with >2000 mutations were excluded from this dataset. The population genetic dataset (pLoF Metrics by Gene TSV file) for evaluating features, such as LoF intolerance, was downloaded from gnomAD v2 (https://gnomad.broadinstitute.org/downloads/) (64). Additional details regarding features calculation can be found in data file S1.

Epigenetic datasets. We downloaded all peak BED files (hg19) for H3K4me3 and other representative histone modifications from the ENCODE project (www.encodeproject.org/). The full file names and download links are listed in data file S1. The gene-body canyon annotation file (65), including DNA methylation information, was obtained from a previous study (21), which were based on TCGA whole-genome bisulfite sequencing (WGBS) data. The data for calculating promoter and gene-body cancernormal methylation difference were also downloaded from the level 3 methylation data from the COSMIC website (v.90). Repli-seq Binary Alignment Map (BAM) datasets were downloaded from the ENCODE project website, and the featureCounts program (http://subread.sourceforge.net/) was used to assign BAM reads to gene bodies. Read counts were normalized on the basis of the sequencing depth of the BAM files, and the normalized read numbers were used to calculate the replication timing S50 score (66). This score, which determines the median replication timing, was calculated by a tool available from a previous study (66). The super enhancer annotation was downloaded from the dbSUPER database (23).

Other datasets. The level 3 TCGA data, which include the processed somatic CNA and gene expression data, were downloaded from the COSMIC website (v.90) and used without processing. The processed cell proliferation (dependency) scores from 436 CRISPR-treated cell line samples were obtained from the DepMap website (Avana-17Q2-Public_v2) (24). For each gene, gene expression was aggregated across samples to obtain the median Z score. The phastCons scores were downloaded from the University of California, Santa Cruz (UCSC) (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/). The dataset including the gene damage index (GDI), Primate dN/dS score, RVIS percentile, ncRVIS, ncGERP, family member count, and gene age features were downloaded from https://github.com/RausellLab/NCBoost (22). The dataset is gene-centric, and no further processing was done.

The training set contained 242 high-confidence TSGs and 240 high-confidence OGs without overlapping from the v.87 CGC database on the COSMIC website, as well as 4058 NGs obtained as follows. The initial set of NGs was obtained from Davoli et al. (9). However, this initial set is likely to include mislabeled genes. To address this, those that overlap with the following gene lists (18 June 2020) were excluded from this initial NG set: (i) Candidate Cancer Gene Database (67), (ii) CancerMine (32), (iii) a cancer gene list compiled by Chiu et al. (68), (iv) the genes (OncoScore > 21.09) in the OncoScore database (69), and (v) allOnco Cancer Gene List (v3 February 2017; http://www.bushmanlab.org/links/genelists). The final training gene sets are available in data file S2.

The candidate mutational features were previously defined by Davoli et al. (9) and Tokheim et al. (7). In addition to these features, other gene-centric features were also collected. Features were categorized into the following classes: Genomics, Mutation, Epigenetics, and Phenotype, and additional details regarding these features can be found in data file S1. The feature IDs mentioned below correspond to data file S1.

Features 1 to 20 were quantified on the basis of the combined mutation data using the script provided by Davoli et al. (9). Further information for these features can be found in their paper (9). For features 1, 5, and 6 in data file S1 that quantify the density of different categories of mutations within genes, only the coding sequence (CDS) length (per kilobase) of each gene is considered. For mutational features 8 to 15 and 28 that include ratios, a pseudocount estimated as the median of each feature across all genes was added, as described by Davoli et al. (9).

Features 11 to 15 rely on the functional effects of missense mutations, including high functional impact (HiFI) or low functional impact (LoFI) (9). The PolyPhen-2 Hum-Var prediction model was used to estimate the functional effects of missense mutations and to classify them as either HiFI or LoFI (9), based on the probability of functional damage as estimated by the PolyPhen-2 HumVar algorithm. Features based on HiFI and LoFI include the following: (i) benign mutations: silent and LoFI missense mutations; (ii) LoF mutations: nonsense and frameshift mutations; and (iii) HiFI missense mutations (damaging missense mutations). PolyPhen-2 scores (feature 16) were calculated by the PolyPhen-2 web server (http://genetics.bwh.harvard.edu/pph2/) (70). The missense MGAentropy scores (feature 33), which also measure the multispecies conservation of missense mutation sites, were also calculated by the Cancer-Related Analysis of Variants Toolkit (CRAVAT) tool (71).

Other mutation types include splicing/total mutations (feature 19) and inactivating fraction (feature 27). Splicing mutations are those that affect splicing sites; >95% of splicing mutations are in the first two positions at donor or acceptor sites. Inactivating mutations include indel frameshift, splice site, translation start site, and nonstop mutations. Features 21 to 29 that were introduced in Tokheim et al.s paper were quantified based on our revised version of the script provided by Davoli et al. (9), given that these features can be quantified in a similar way to that for features 1 to 20. The lost start and stop fraction (feature 26) was defined as the fraction of the translation start site and nonstop mutations in total mutations. The recurrent missense fraction (feature 23) was defined by missense mutations occurring in more than one patient sample.

Features 42 to 46 are population genetics-based mutational features. For LoF constraints, three categories of tolerance to LoF mutations were defined by gnomAD: null (LoF mutations are fully tolerant), recessive (heterozygous LoF mutations are tolerant), and haploinsufficient (heterozygous LoF mutations are intolerant). The probability of the three types of mutations can also be obtained from the dataset (features 42 and 43) or be derived based on simple calculation (sum of the probability of three categories of intolerance equals 1). A pLI score was initially introduced to determine the likelihood that a given gene is intolerant of LoF mutations. The difference between LoF o/e and pLI is explained at https://blog.limbus-medtec.com/how-to-use-gnomad-v21-for-variant-filtering-d7d2a7ee710a. For synonymous, missense, and LoF mutations (features 44 to 46), a signed Z score to describe the o/e was obtained from the gnomAD dataset. Higher Z scores indicate intolerance to variation or increased constraint, whereas lower Z scores indicate tolerance to variants.

In addition to genetic data, epigenetic data have been shown to be associated with cancer driver genes. Here, we used the peak length and height to characterize histone modifications. We also used cancernormal methylation difference to characterize gene promoter and gene-body methylation in cancer and normal samples. These potentially useful features (features 39 and 40 and 54 to 75) were previously used in epigenetics studies, but to what extent these features are useful in predicting cancer driver genes is not systematically evaluated. The histone modification BED files were processed based on our previously published procedures (19). Briefly, adjacent peaks were merged when peaks are within 3 kb by the merge command from bedtools (https://bedtools.readthedocs.io/). Peaks overlapping with the longest transcript of a gene with at least 50% of peak length were assigned to that gene by bedmap function in the BEDOPS tool (https://bedops.readthedocs.io/) with the following parameters: --max-element --echo --fraction-map 0.5 --delim 't' --skip-unmapped. Features of mean peak length were calculated based on the merged peaks. For features of height of peaks, the maximum signal values (seventh column in BED 6 + 4 narrow peak files used in ENCODE) were used. Promoter and gene-body cancernormal methylation difference features (features 39 and 40) were defined by the mean methylation level in cancer samples (Beta Value column in the dataset) minus that in normal samples (Avg Beta Value Normal column in the dataset) based on COSMIC 450 K methylation data. 450K probes were mapped to genes according to genomic coordinates (hg19). The gene-body canyon cancer/normal methylation ratio feature (feature 41) was inspired from a previous study (21). The ratio value was determined by the mean methylation level in cancer samples divided by that in normal samples in TCGA WGBS methylation data. To make gene-body cancernormal methylation difference (feature 40) and gene-body canyon cancer/normal methylation ratio (feature 41) available to all genes, genes without applicable feature values were imputed as 0. Genes were linked to gene-body canyons by BEDOPS with the same parameters as shown above. The difference between features 40 and 41 is that feature 41 is only available to genes with gene-body methylation canyons defined by a previous study using TCGA WGBS data (21), while feature 40 is available for all genes with 450K probes. We previously used TCGA WGBS data to define feature 41 because WGBS methylation data have a significantly higher resolution than 450K methylation data, while we were unable to identify large DNA methylation canyons using COSMIC 450K data. For feature 34, Repli-seq BAM datasets were quantified by the featureCounts program (http://subread.sourceforge.net/) to assign BAM reads to gene bodies. Read counts were normalized on the basis of the sequencing depth of the BAM files, and the normalized read numbers were used to calculate the S50 score (66). Early replication timing (feature 34) was quantified by the S50 score. All bam data are assigned to different cell cycle stages (G1, S1, S2, S3, and S4) for the S50 score calculation. This score, which determines the median replication timing (from 0 to 1), was calculated based on the algorithm proposed by a previous study (66). An S50 score that closes to 0 means early replication timing, whereas an S50 score that closes to 1 means late replication timing. Super enhancer percentage (feature 38) was calculated as the percentage of cell lines in which super enhancers are associated with any transcripts of genes.

Feature 29 (log gene length) was defined as the log2-transformed length of the maximum transcript of a specific gene based on the ENSEMBL GTF annotation file. Feature 30 (log CDS length) was obtained from the COSMIC mutation files supplemented by the ENSEMBL GTF annotation file and then log2-transformed. Features 31 is CNA deletion percentage that was calculated based on column 17 (Mut type: gain or loss) in the original dataset (CNA amplification percentage can be calculated by 1 CNA deletion percentage). The VEST scores (feature 35), which indicate missense pathogenicity for each mutation, were calculated by CRAVAT. Gene expression Z score (feature 36) was used to quantify the gene expression based on the regulation column in the original data. The exon conservation (phastCons) score (feature 32) that is based on the average phastCons score for maximum transcripts of genes was also calculated by CRAVAT. Feature 37 (increase of cell proliferation by CRISPR knockdown) was calculated on the basis of the cell proliferation scores in the CRISPR-screening data. A lower cell proliferation for a gene in a cell line means that the gene is more likely to be essential to the cell line. A score of 0 means nonessential, whereas a score of 1 means essential.

Features 47 to 53 are evolution-based features, including GDI (mutational damage that has accumulated in the general population), Primate dN/dS score (ratio between the number of nonsynonymous substitutions and the number of synonymous substitutions), RVIS percentile (high RVIS percentiles reflect genes highly tolerant to variation), ncRVIS, ncGERP, family member count (number of human paralogs for each gene), and the gene age (time of evolutionary origin based on the presence/absence of orthologs in vertebrates). Genes with higher GERP scores are more constrained. ncRVIS is a measure of deviation from the genome-wide variants found in non-CDSs of genes (46). A negative ncRVIS score indicates less common noncoding variation than predicted. In ncRVIS and ncGERP, the non-coding regions were defined as the untranslated regions as well as nonexonic 250 bp upstream of TSSs.

The elastic net is a penalized regression method that can select a limited number of features that contribute to the response. Similar to the lasso, the elastic net selects features by shrinking some of the coefficients to be zero; the remaining features with nonzero coefficients are considered to have larger effects on the response and thus are selected and kept in the model. The main advantage of the elastic net over the lasso is that, in case of collinearity, the elastic net simultaneously selects a group of collinear features whereas the lasso tends to select only one feature from the group. (The simultaneous selection of collinear features is desired because, in the extreme situation where these collinear features are exactly identical, the regression method should assign equal coefficients to these features.) Therefore, we chose the elastic net over the lasso because we observed high collinearity among the original list of 75 features.

Specific to DORGE, we used LR with the elastic net penalty to train two binary classifiers for predicting TSGs and OGs, and these classifiers were referred to as DORGE-TSG and DORGE-OG. We used the R function glmnet from the R package glmnet (https://cran.r-project.org/web/packages/glmnet/index.html). The tuning parameter was selected by fivefold cross-validation using the function cv.glmnet from the same R package, while the parameter, which balances the lasso and ridge penalties, was set to the default value 0.5.

For every gene, DORGE-TSG predicted it with a probability of being a TSG, and this probability is defined as the genes TSG score. The OG scores are defined similarly by DORGE-OG for all genes. Having two separate binary classifiers, one for detecting TSG and the other for detecting OG, DORGE is able to detect dual-functional genes.

The codes for training DORGE-TSG and DORGE-OG and obtaining predicted TSGs and OGs are available at https://github.com/biocq/DORGE. An online video that explains the code is available at http://www.youtube.com/watch?v=Pk8ZqoHK8zk.

PRC analyses were performed using the R PRROC. The AUPRCs were calculated using TSG scores and OG scores by the pr.curve function in the package.

We used the in-house code available in our DORGE GitHub repository to find the cutoffs on TSG scores and OG scores such that the population FPRs (type I errors; for TSG prediction, the FPR is the conditional probability of misclassifying an NG as a TSG) were controlled under 1%. The code was an implementation of the Neyman-Pearson classification umbrella algorithm (25).

The gene lists and datasets that we used to evaluate our DORGE-predicted novel TSGs/OGs are as follows: (i) CGC gold-standard gene list: The CGC is a widely used gold-standard list of cancer-related genes. We used the CGC v.87 gene list as the testing gene set while excluding those in the CGC v.77 gene list to evaluate the performance of our prediction. (ii) ATAC-seq data: ATAC-seq data were taken from pan-cancer peak calls from data file S2 in Corces et al.s paper (39). (iii) ERs: The ER gene list comes from a recent study focused on the characterization of ERs (40) and the EpiFactors database (72), after removing the genes that function only as TFs. (iv) Candidate TSGs identified by SB insertional mutagenesis. The inactivating pattern gene list was downloaded from the SBCDDB (42). This database contains cancer driver genes that were identified by SB insertional mutagenesis in tumor models. For the evaluation of DORGE-predicted novel TSGs, only genes with an inactivating pattern in the SBCDDB were kept, resulting in 1211 genes. (v) Survival data: Survival data were downloaded from the OncoLnc website (44). (vi) shRNA screening data. The gene-centric shRNA screening data were taken from the Achilles project (43). (vii) Evolutionary conservation data: For evolutionary conservation, we used phyloP scores that measure nonneutral substitution rates based on multispecies alignments. The phyloP data were downloaded from the UCSC (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phyloP46way). We computed the average log(phyloP) and phastCons score for each gene by averaging the base pairlevel conservation values for every position in each gene. (viii) Phyletic age: We downloaded the precomputed phyletic age gene lists in human and measured enrichment of our predicted genes within the gene sets from different phyletic ages (i.e., Eukaryota, Metazoa, Chordata, and Mammalia) from the OGEE database (48). (ix) The BioGRID v3.5.183 data were downloaded from the website https://thebiogrid.org/. Biological network-related metrics can be calculated by the Cytoscape software (73). Additional information on the network metrics can be found in the Supplementary Text. (x) The PharmacoDB (74) gene-drug network data were downloaded from https://pharmacodb.pmgenomics.ca/. (xi) HKGs: We downloaded an HKG gene list from http://www.tau.ac.il/~elieis/HKG/, which includes 3804 HKGs. (xii) Essential genes: The essential and nonessential gene lists were also downloaded from the OGEE database. To shorten this list, we limited our essential gene set to those with >2 in entries of the OGEE database, resulting in 2340 definitive essential genes. Nonessential genes that overlap with essential genes were removed, resulting in 11,990 nonessential genes. (xiii) The drug response data were downloaded from Drug Gene Budger (54). Only significant drug-gene relationships (Q value < 0.05 and fold change > 2) were selected from the CREEDS data collections downloaded from the Drug Gene Budger database (54).

KEGG enrichment analyses were performed using Enrichr (75) for DORGE and DORGE variant predicted novel genes.

For DORGE-predicted novel genes and CGC genes, PPI module analysis was performed by Metascape (76). Networks contain proteins that display physical interactions with at least one other protein in the list. For networks containing 3 to 500 proteins, the MCODE algorithm (77) was applied to identify densely connected network modules.

One-sided Wilcoxon rank-sum test was used when comparing different categories of genes. Gene enrichment analyses were performed in R, using one-sided Fishers exact test (fisher.test function in R). P values of Spearman correlation were calculated by test for association/correlation between paired samples (cor.test function in R). Binomial test was used to test the enrichment of dual-functional genes in network hub genes.

Go here to read the rest:
DORGE: Discovery of Oncogenes and tumoR suppressor genes using Genetic and Epigenetic features - Science Advances

What’s on TV Thursday: ‘Station 19’ and ‘Grey’s Anatomy’ – Los Angeles Times

During the coronavirus crisis, the Los Angeles Times is making some temporary changes to our print sections. The prime-time TV grid is on hiatus in print but an expanded version is available in your daily Times eNewspaper. You can find a printable PDF online at: latimes.com/whats-on-tv.

Jeopardy! (N) 7 p.m. ABC

Young Sheldon Sheldon (Iain Armitage) gets a summer job at the local train museum while Missy (Raegan Revord) enters a new stage of womanhood. Lance Barber, Zoe Perry and Montana Jordan also star in this new episode of the family comedy. 8 p.m. CBS

Superstore With the floor supervisor position available, Jonah (Ben Feldman) decides to throw his hat in the ring, much to Dinas (Lauren Ash) chagrin. Also, Mateo and Glenn (Nico Santos, Mark McKinney) struggle to find the right balance in their working relationship in this new episode of the workplace comedy. 8 p.m. NBC

Supernatural Everything is on the line as Sam and Dean (Jared Padalecki, Jensen Ackles) prepare to end their high-stakes battle against God (guest star Rob Benedict), setting up next weeks series finale. 8 p.m. CW

Station 19/Greys Anatomy The series set at a Seattle fire station and a hospital launch new seasons with a three-hour crossover event, set against the backdrop of the COVID-19 pandemic. Jaina Lee Ortiz, Grey Damon and Okieriete Onaodowan star in Station 19" and Ellen Pompeo and Chandra Wilson in Greys Anatomy. 8 and 9 p.m. ABC

B Positive When Drews (Thomas Middleditch) condition worsens, he wants to make sure Gina (Annaleigh Ashford) is serious about donating her kidney, so he makes her an offer she cant refuse in this new episode of the comedy. 8:30 p.m. CBS

Law & Order: Special Victims Unit Capt. Olivia Benson (Mariska Hargitay) and her team grapple with the COVID-19 pandemic as this procedural drama returns for a 22nd season. Chris Meloni, who starred as Bensons partner Elliot Stabler, guest stars in advance of his 2021 spinoff series, Law & Order: Organized Crime. 9 p.m. NBC

The Outpost Talon (Jessica Green) sets up a sanctuary to protect the infected, while Janzo and Wren (Anand Desai-Barochia, Izuka Hoyle) continue their research in hopes of finding a cure. Jake Stormoen, Reece Ritchie and Adam Johnson also star in this new episode. 9 p.m. CW

Total Bellas (season premiere) 9 p.m. E!

Flip or Flop Laguna Hills. 9 p.m. HGTV

Mom Bonnie and Adam (Allison Janney, William Fichtner) regret spending their first wedding anniversary with Marjorie and her new boyfriend (Mimi Kennedy, Peter Onorati). Jaime Pressly also stars. 9 p.m. CBS

The Unicorn Wade (Walton Goggins) meets a woman (new cast member Natalie Zea) and feels he has an instant connection with her in the comedys season premiere. Maya Lynne Robinson and Makenzie Moss also star. 9:30 p.m. CBS

Star Trek: Discovery (N) 10 p.m. CBS

The Paley Center Presents Law & Order: Before They Were Stars This new special showcases the appearances of some of todays top stars in the Law & Order franchise when they were up-and-coming actors. 10 p.m. NBC

2020 Masters Tournament The championship was postponed in April due to the COVID-19 pandemic, and there will be no fans attending for the first time in the events history. First round, 10 a.m. and 5 p.m. ESPN; highlights, 11:35 p.m., CBS

Soccer International friendly: Wales versus United States, 11:30 a.m. FS1

NFL Football The Indianapolis Colts visit the Tennessee Titans, 5 p.m. Fox

College Football Colorado State visits Boise State, 5 p.m. FS1

CBS This Morning Sportscaster Jim Nantz. (N) 7 a.m. KCBS

Today Olivia Colman, Emma Corrin, Josh OConnor and Gillian Anderson (The Crown). (N) 7 a.m. KNBC

KTLA Morning News (N) 7 a.m. KTLA

Good Morning America Henry Golding; Laurence Fishburne. (N) 7 a.m. KABC

Good Day L.A. (N) 7 a.m. KTTV

Live With Kelly and Ryan Chandra Wilson (Greys Anatomy). (N) 9 a.m. KABC

The View Sarah Cooper. (N) 10 a.m. KABC

Rachael Ray (N) 10 a.m. KTTV

Home & Family Catherine Bell and Robert Buckley. (N) 10 a.m. Hallmark

The Wendy Williams Show Iyanla Vanzant. (N) 11 a.m. KTTV

The Talk Walton Goggins (The Unicorn). (N) 1 p.m. KCBS

Tamron Hall (N) 1 p.m. KABC

The Drew Barrymore Show Chris ODonnell (NCIS: Los Angeles); Chef Geoffrey Zakarian. (N) 2 p.m. KCBS

The Kelly Clarkson Show Billy Gardell; Los Angeles teacher of the year. (N) 2 p.m. KNBC

The Doctors A look back at some of the shows most dramatic episodes in which lives were changed or saved. (N) 2 p.m. KCOP

Dr. Phil A man disabled in a work accident says his family doesnt understand what he is going through. (N) 3 p.m. KCBS

The Ellen DeGeneres Show Vince Vaughn (Freaky); Mario Lopez (Saved by the Bell). (N) 3 p.m. KNBC

The Real Jeezy; Jerry Springer. (N) 3 p.m. KCOP

Amanpour and Company (N) 11 p.m. KCET; midnight KVCR

The Daily Show With Trevor Noah (N) 11 p.m. Comedy Central

Conan David Oyelowo. (N) 11 p.m. TBS

The Tonight Show Starring Jimmy Fallon James Spader; Chris Paul; 21 Savage; Metro Boomin. (N) 11:34 p.m. KNBC

Jimmy Kimmel Live! Ellen Pompeo; Chris Stapleton performs. (N) 11:35 p.m. KABC

The Late Show With Stephen Colbert Michael Moore; Sara Bareilles performs. (N) 11:50 p.m. KCBS

Late Night With Seth Meyers Dan Aykroyd; Anya Taylor-Joy; Carter McLean. (N) 12:36 a.m. KNBC

Nightline (N) 12:37 a.m. KABC

The Late Late Show With James Corden Sen. Cory Booker (D-N.J.); Jack Harlow performs. (N) 12:52 a.m. KCBS

Great Expectations (1974) 8:17 a.m. Cinemax

Upgrade (2018) 8:30 a.m. FXX

The Lusty Men (1952) 8:45 a.m. TCM

Whats Love Got to Do With It (1993) 9:30 a.m. VH1

Punch-Drunk Love (2002) 10:22 a.m. Cinemax

Tootsie (1982) 10:30 a.m. TMC

The Snows of Kilimanjaro (1952) 10:45 a.m. TCM

The Amazing Spider-Man (2012) 11:20 a.m. and 11:01 p.m. Encore

Trainwreck (2015) 11:30 a.m. FX

The Avengers (2012) 11:55 a.m. Epix

Statue of Liberty (1985) Noon KVCR

The Preachers Wife (1996) Noon Freeform

Purple Rain (1984) 12:10 p.m. VH1

Hellboy (2004) 12:15 p.m. Starz

The Spectacular Now (2013) 12:30 p.m. TMC

Smash-Up: The Story of a Woman (1947) 12:45 p.m. TCM

Green Book (2018) 1 and 11:20 p.m. Showtime

The Wolf of Wall Street (2013) 2 p.m. IFC

Zathura (2005) 2 p.m. Syfy

71 (2014) 2:20 p.m. Epix

Harriet (2019) 2:40 p.m. HBO

Ill Cry Tomorrow (1955) 2:45 p.m. TCM

Fatal Attraction (1987) 3:26 p.m. Encore

Darkest Hour (2017) 3:45 p.m. Cinemax

The Green Mile (1999) 4 p.m. AMC

Sky Captain and the World of Tomorrow (2004) 4 p.m. TMC

The SpongeBob Movie: Sponge Out of Water (2015) 4:30 p.m. Nickelodeon

The Hunt for Red October (1990) 5 p.m. BBC America

The Natural (1984) 5 p.m. MLB

The Gilded Lily (1935) 5 p.m. TCM

The Best Exotic Marigold Hotel (2011) 5:15 p.m. HBO

While Were Young (2014) 5:25 p.m. Showtime

The Wife (2017) 5:29 p.m. Encore

Layer Cake (2004) 6 p.m. TMC

Tangled (2010) 7 p.m. Freeform

Pretty Woman (1990) 8 p.m. AMC

The Long Kiss Goodnight (1996) 8 p.m. Epix

True Romance (1993) 9 p.m. Encore

Mulan (1998) 9 p.m. Freeform

Puss in Boots (2011) 9 p.m. Nickelodeon

Frozen (2013) 9 p.m. Starz

The Mob (1951) 9:30 p.m. TCM

Executive Decision (1996) 10 p.m. Epix

Under the Skin (2013) 10 p.m. TMC

Dope (2015) 10 p.m. VH1

Premium Rush (2012) 10:46 p.m. Starz

Read more from the original source:
What's on TV Thursday: 'Station 19' and 'Grey's Anatomy' - Los Angeles Times

‘Grey’s Anatomy’ powered his ambitions. Now, this doctor is helping the show tackle coronavirus – CNN

"I have to tell you, episode one, I watched it and the voice of Ellen Pompeo [as] Meredith Grey...it was like an aria. It was a solemn aria about life and death," he recalled via phone recently.

When his sister, Safa Alazari, now an aspiring architect, handed him DVD sets for the first three seasons of "Grey's Anatomy," he had initially resisted her efforts to get him to watch the show about doctors in the United States.

Now 36 and a senior surgical adviser for the series, he's grateful for his sister's insistence. And, it is likely, so is "Grey's Anatomy" because Alazari's first-hand experience of working during the current global health pandemic is helping power a season unlike any other.

"I was a proponent of starting in a universe of Covid because I feel our show has such an incredible accessibility to people and relatability to people that we need to be there with them," Alazari said. "We're going to share your frustration. We're going to share your loss...and also, who knows, maybe we'll be there with you when this is over."

If there's a reason to share Alazari's sense of hope, it's that he's someone who knows a lot about the wonderful twists life can deliver after a period of struggle.

How to save a life

Born and raised in Iraq, Alazari knew he wanted to become a doctor when he was young -- around 18. Having lived through multiple wars, he says he felt a sense of "obligation" to be a helper in a world of human suffering, poverty and illness (his father died of pancreatic cancer).

"I wanted to understand more about this science that is fascinating. It's so empowering," he said. "But I didn't understand the impact of it back then -- to be able to change people's lives."

"Grey's Anatomy" opened his eyes in many ways in 2007.

He'd gone to the school of medicine at Al-Nahrain University in Baghdad, but upon watching the show, decided to work hard so he could practice medicine the United States. He said he was drawn to the cutting-edge technology in biotech, surgical advancements, access to new pharmaceuticals and other systems that empower doctors to offer higher quality care. (He is, however, critical of the inflated charges that have "interrupted and challenged affordability and accessibility.")

"Remember, I lived in Iraq during the stern embargo years, when UN Sanctions limited medical supply access to the country which traumatized our health care system there," he said. "That, alongside corruption, [and] we were left with a dysfunctional health care in that country."

Five years into his time in the US, then a general surgery resident, he received an email soliciting applications for a program created by Dr. Zoanne Clack, a "Grey's Anatomy" executive producer and ER doctor. Those selected for the Grey's AnatomySurgical Communication Fellowship would serve as consultants for a period of time and help the writers guide the direction of the medical stories told on the show.

"I remember I said, 'There's no way they're going to take me. This show really got me to this point. There's no way life is so good that it's taking you back to the show,'" he said.

Life was that good.

Essential entertainment

Seven years later, "good" isn't the word most would use to describe 2020.

Particularly recalling the early days of the pandemic, Alazari called his experience "dystopian."

For the past three years, he's been practicing medicine while also working in the writer's room to help keep his skills sharp.

Early on, when little was known about coronavirus or its transmission, frustration mounted for him and other medical professionals as they dealt with having no protocols, personal protective equipment or tests for those who came in with the tell-tale flu-like symptoms. That time was all about improvisation, he remembered.

"But, in the end, I've been there before, as you know," he said. "For me, it was eerily very reminiscent of my wars back in Iraq. Like, I felt this was a war zone. This is the first time I'd seen America in such a way -- where you feel that people are scared, the enemy is invisible, we don't have enough information of what this is and we don't have resources for this."

Still, he felt it was important to meet the need for doctors, even if there was, at times, little they could provide in the way of answers.

"I felt this was a calling for me as a doctor, and I guarantee you, a lot of doctors and nurses felt the same way -- that we have to be there for our people," he said. "It's incumbent upon me to be a voice of reason and science and reassurance to everybody that I can get access to."

Even with the weight of the task, he said, "it absolutely was an honor to be of service."

There was a brief moment when "Grey's Anatomy" showrunner Krista Vernoff considered not tackling the pandemic on the show.

"We didn't want to retell the story of Covid as much as make Covid the universe that we're living in," he said. "We will preserve the spirit of 'Grey's Anatomy' as it is, and at the same time, be present for telling stories about the pandemic."

As a show that has earned praise for its handling of and storylines about many medical issues and topics -- including maternal mortality, sexual assault and human trafficking -- they will tackle coronavirus with a sense of responsibility.

They have to walk a "thin line," Alazari said. They want to provide up-to-date information but not, say, advocate for an unproven treatment of some kind. For that and other reasons, Alazari, an essential worker himself, sees television as essential in his own right at this time.

"I'm seeing how the set has transformed, and what the cast and crew has to go through in order to produce these episodes is really heroic, in my opinion," he said. "In the end, you're creating this entertainment to go back to the homes of these people who have been watching the show for 17 years because you want to be there with them."

"The cast and crew are putting such a tremendous effort, and I salute their courage for that."

Read the original post:
'Grey's Anatomy' powered his ambitions. Now, this doctor is helping the show tackle coronavirus - CNN

Sarah Drew on ‘Grey’s Anatomy’ Possibly Ending and If She’d Ever Return to the Show (Exclusive) – Entertainment Tonight

Sarah Drew on 'Grey's Anatomy' Possibly Ending and If She'd Ever Return to the Show (Exclusive) | Entertainment Tonight arrow-left-mobilearrow leftarrow-right-mobilearrow rightGroup 7Gallery Icon Copy 2Video Play Button Copy 5Hamburger MenuInstagramYoutubeShare Button7C858890-6955-48EA-B871-66CE1E33590CVideo-Playbutton Copy Skip to main content

Read more here:
Sarah Drew on 'Grey's Anatomy' Possibly Ending and If She'd Ever Return to the Show (Exclusive) - Entertainment Tonight

‘Grey’s Anatomy’ star Kim Raver on how Teddy and Owen are coping in season 17 – Entertainment Weekly

Grey's Anatomy's Kim Raver on how Teddy and Owen are coping | EW.com Skip to content Top Navigation Close View image

Grey's Anatomy's Kim Raver on how Teddy and Owen are coping in season 17

this link is to an external site that may or may not meet accessibility guidelines.

Excerpt from:
'Grey's Anatomy' star Kim Raver on how Teddy and Owen are coping in season 17 - Entertainment Weekly

‘Grey’s Anatomy’ Star Jesse Williams Challenges Kehlani to Sing Show’s Theme Song on ‘Quizzed’: Exclusive Vide – Billboard

She starts off answering a little hesitantly, but hits her stride when it came to a question about the show's instantly recognizable song. Says Williams, "'How to Save a Life' by The Fray is the show's theme song." Kehlani quickly answers that it's false.

"Do you know what it is?" the actor asks.

"I know what it sounds like ...," the singer-songwriter offers.

At that, Williams brings in a challenge: "Let's hear it!"

After laughing and waving her hand in the negative, Kehlani responds, "No! I can't make that little ding sound!" Though the "Take You Back" singer turns down the opportunity to sing the tune (it's Psapp's "Cosy in the Rocket"), she says she might just sample it instead.

Williams gave Kehlani even more of a challenge when round three -- the quotes section -- rolled around. "I'm terrible at this!" cries the singer at the start of the round. "I don't even know quotes from ..."

"You know your own lyrics?" Williams presses.

"Barely!" admits Kehlani. "On stage, it takes me forever if they're from more projects older than a project ago."

See how well Kehlani knows Grey's Anatomy by watching the full episode of "Quizzed" above!

Original post:
'Grey's Anatomy' Star Jesse Williams Challenges Kehlani to Sing Show's Theme Song on 'Quizzed': Exclusive Vide - Billboard

Clinical Evidence Shows that the FlareHawk Multiplanar Expandable Cage Delivers Favorable Fusion and Patient Outcomes Demonstrating Conformity to…

PALM BEACH GARDENS, Fla., Nov. 12, 2020 (GLOBE NEWSWIRE) -- Integrity Implants Inc., a privately held medical device company dedicated to delivering innovative solutions for spine surgery, today announced the publication of a series of articles in the International Journal of Spine Surgery (www.ijssurgery.com) that demonstrate fusion efficacy of the FlareHawk expandable interbody device without the use of costly processed biologics. Additionally, there were no observed reports of device subsidence. Moreover, the study set provides confirmation of the cages ability to conform to patients endplate geometry.

The October Special Issue features several studies on FlareHawk expandable cage technology, including:

The articles present favorable clinical outcomes, as well as a discussion of some of the unique design features of the multidirectional expandable FlareHawk device that may contribute to its clinical efficacy.

I.

The lumbar Interbody fusion study substantiating the safety and efficacy of the FlareHawk biplanar expandable cage is presented by principal investigator Dom Coric, M.D., of Carolina Neurosurgery & Spine Associates, Chief of Neurosurgery at Carolinas Medical Center and Spine Division Chief at Atrium Musculoskeletal Institute in Charlotte, North Carolina. The study followed strict inclusion criteria including the allowance of only allograft and/or autograft to facilitate fusion and its participants represented patients with noteworthy comorbidities, including high BMI, diabetes, and current/former smokers. Among subjects with radiographs at 12 3 months, nearly all (56 of 58 patients, or 96.6%, and 75 of 77 levels, or 97.4%) achieved fusion based on Bridwell-Lenke grading. Among 45 evaluable subjects, 71% (32 patients) achieved clinically significant improvements in VAS leg pain, and 76% (34 patients) achieved clinically significant improvements in VAS back pain. Additionally, there were no (0%) reported device-related adverse events (AEs). Common device-related AEs associated with interbody fusion devices include, but are not limited to, subsidence, displacement, and nerve injury. The incidence rate for non-device-related AEs was consistent with other PLIF/TLIF studies. There were no (0%) observations of cage subsidence (defined as an overlap between the vertebral endplates and the device exceeding 25% of the device height) and only one case (1.7%) of observed device migration (defined as displacement of the device relative to the position within intra-operative or immediate post-operative images). Analysis of that case reveals that the cage moved only slightly within the disc space and the patient went on to fuse.

Dr. Coric notes, This study adds to the evidence base supporting the safe and effective use of expandable interbody spacers in the treatment of lumbar spine diseases. The FlareHawk implants ability to expand in both cephalad-caudal and lateral-medial planes is especially advantageous for decreasing neural retraction while maximizing vertebral body endplate coverage and fusion area.

Raphael Roybal, M.D., M.B.A, co-author of the paper and Director of The Spine Institute at Chatham Orthopaedics in Savannah, Georgia, notes, Popular minimally invasive TLIF techniques, typically involving a unilateral approach to the disc space, may limit the amount of disc space preparation and/or bone graft delivery, thereby impeding fusion rates. Biplanar expandable spacers that support minimally invasive surgery, safer implantation, and optimal patient outcomes will be relevant in delivering value-based spine care. The FlareHawk device has transformed my practice by allowing me to transition cases to an outpatient setting that is better for my patients and staff.

Mark Grubb, M.D., co-author and minimally invasive spine surgeon at Northeast Ohio Spine Center in Akron, Ohio, adds, Most expandable spacers use complex articulation mechanisms to expand the implant profile in either in height or width, but typically do not achieve multidirectional expansion. Furthermore, these complex mechanisms can limit the space available for bone grafting. I appreciate that the FlareHawk cage features a small insertion profile and expands in footprint, height, and lordosis via an unobtrusive mechanism that allows me to deliver bone graft through the expanded cage and into the intervertebral body space. Additionally, the ability to treat my patients via a single-position, posterior surgery allows me to improve both the quality and efficiency of my practice.

II.

Similarly, the work from Boyle Cheng, Ph.D., Professor at Drexel University College of Medicine in Pennsylvania and Director of Research at the Allegheny Health Network (AHN) Neuroscience Institute, assesses the feasibility of a bidirectional expandable interbody cage to achieve interbody fusion and discusses two novel aspects of the FlareHawk device: (1) its multimaterial, open-architecture design that provides a combined spring effect for a modulus of elasticity similar to that of bone while maintaining sufficient support and stiffness from the titanium shim, and (2) implant geometry that allows for the naturally occurring deformation of the PEEK shell to conform to each patients endplate configuration.

Dr. Chengs study hypothesizes that utilizing two components, each with the appropriate intrinsic material stiffness, provides both flexibility and stability resulting in a device with a composite construct stiffness more favorable to load transfer over a large contact area with the endplates. Reducing the construct stiffness is essential, not only to reduce the risk of subsidence but also to increase the load sharing and improve bone formation as a function of Wolffs Law. As such, the multimaterial construct, open architecture, and bi-directionally expanding design of the FlareHawk cage may contribute to the positive clinical outcomes observed.

Additionally, the shim-in-a-shell design conforms to patient-specific endplate anatomy. Measurement of the relative position of the tantalum markers revealed that 16 of the 18 devices showed coronal plane deformation of 1.82mm (18.55%) anteriorly and 1.41mm (15.49%) posteriorly. This measured deformation correlates well with values of endplate concavity reported in the literature (1.37 - 1.90mm), suggesting that the PEEK shell is conforming to the natural endplate anatomy. This material behavior, along with bidirectional expansion, serves to increase the surface area of the bone-implant interface and may better distribute the loads across the endplate.

Dr. Cheng notes, This represents yet another design aspect potentially contributing to the favorable fusion rates seen with the FlareHawk device. In the study, all 18 devices (100%) were determined to have fused based on demonstrated bone growth evidence (average volume of 586.42 mm3) and Bridwell-Lenke classification. An interbody device with this unique combination of compliant and rigid components has the potential to conform to the interbody space while maintaining sufficient stability to achieve fusion. This represents a significant advancement not just in expandable cage technology, but in the larger context of achieving successful lumbar interbody fusion. The cage appears to conform in shape obliquely, sagittally, and coronally, resulting in adaptive implant geometry that surgeons may desire. Additionally, the lack of endplate violation suggests that the FlareHawk cage is an atraumatic implant.

The FlareHawk spinal implant is the flagship product for Integrity Implants and represents the first of its kind in the expandable cage market. Much like coronary stents that offer patients a less-invasive alternative to open-heart procedures, the FlareHawk expandable cage features a PEEK shell that is inserted in a compressed form that can be effectively passed through small neural pathways and, once within the intervertebral disc space, expanded to a larger footprint and height. A titanium shim inserted within the PEEK shell produces the expansion and creates a solid-state construct that is resistant to collapse yet has shown the potential to conform to endplate anatomy to increase surface contact area and lower stresses. The Adaptive Geometry and advanced multimaterial composition embodied in the FlareHawk device respect patient anatomy both during insertion and for long-term stability. To date, approximately 8,500 FlareHawk cages have been implanted in more than 6,000 patients.

Chris Walsh, Integrity Implants CEO, shares, The data suggests that the FlareHawk cage changes the PLIF/TLIF algorithm, enabling a single-position, reproducible, facility-friendly interbody solution that can provide surgeons significant efficiencies and latitude for their practices. There may be profound medical and economic ramifications to study in the future. Before founding Integrity Implants, Wyatt Geist and I observed, as distributors, the power of the lateral approach to transform the modern spine practice through a less-invasive technique. What we are observing with the FlareHawk device, we believe, is even greater in scale. The simple design works in inpatient and outpatient facilities, as well as international markets. This supports our thesis that Adaptive Geometry the ability of an implant to change shape intraoperatively to respect the patients anatomy reconciles the desire for a minimally invasive approach that yields a maximum clinical result through a variety of surgical approaches to the spine, and in a variety of clinical settings. In addition to these papers, IJSS has published Dr. Lee Tans Clinical and Radiographic Outcomes After Minimally Invasive Transforaminal Lumbar Interbody Fusion Early Experience Using a Biplanar Expandable Cage for Lumbar Spondylolisthesis. We are excited about the promise of the FlareHawk expandable cage for MIS TLIF and will discuss those findings in a future release.

About FlareHawk Expandable Lumbar Interbody Fusion System

The FlareHawk Interbody Fusion System is indicated for spinal intervertebral body fusion with autogenous bone graft and/or allogeneic bone graft composed of cancellous and/or corticocancellous bone in skeletally mature individuals with degenerative disc disease (DDD) at one or two contiguous levels from L2 to S1, following discectomy. DDD is defined as discogenic back pain with degeneration of the disc confirmed by history and radiographic studies. These patients should have at least six (6) months of non-operative treatment. Additionally, these patients may have up to Grade 1 spondylolisthesis or retrolisthesis at the involved level(s). FlareHawk system spacers are intended to be used with supplemental fixation instrumentation, which has been cleared for use in the lumbar spine.

About Integrity Implants Inc.

Integrity Implants, founded in 2016 by seasoned business partners and spine leaders Chris Walsh and Wyatt Geist, is a privately held medical device company headquartered in Palm Beach Gardens, Florida. The Company is dedicated to delivering innovative spine products and solutions to surgeons and their patients around the globe. Its proprietary Adaptive Geometry technology fundamentally respects a patients neural, vascular, bony, and soft tissue anatomy, both during and after implantation.

For more information, please visit the Companys website at http://www.integrityimplants.com.

Media Contact:

Mark Richards512-913-9572mrichards@integrityimplants.com

See the article here:
Clinical Evidence Shows that the FlareHawk Multiplanar Expandable Cage Delivers Favorable Fusion and Patient Outcomes Demonstrating Conformity to...

Collecting Vinyl Records: A Hobby that Will Never Go Out of Style – 9 & 10 News – 9&10 News

With the colder weather slowly pushing us inside for the winter, chances are youre looking for new ways to stay entertained. One hobby that will never go out of style is collecting and playing vinyl records. Studio Anatomy, a recording studio and event space in downtown Traverse City opened a record shop called Eugenes Record Co-Op. Its a consignment shop where people can buy and sell records, turntables, and accessories.

Brian Chamberlain the owner of both Eugenes and Studio Anatomy says, records are definitely the comeback. I think a lot of younger people are interested in something thats tangible.

Eugenes is located within Studio Anatomy, a recording space for some of Northern Michigans talented artists and performers. Anyone and everyone is welcome to use the recording studio for a reasonable fee.

For more information on Eugenes Record Co-Op, click here.

For more information on Studio Anatomy, click here.

See original here:
Collecting Vinyl Records: A Hobby that Will Never Go Out of Style - 9 & 10 News - 9&10 News

Coronary anatomy and comorbidities impact on elective PCI outcomes in left main and multivessel coronary artery disease – DocWire News

This article was originally published here

Catheter Cardiovasc Interv. 2020 Nov 11. doi: 10.1002/ccd.29368. Online ahead of print.

ABSTRACT

INTRODUCTION: The effects of coronary anatomy, lesion complexity, and comorbidities on outcomes of elective percutaneous coronary intervention (PCI) in high-risk patients with left main (LM) and/or multivessel coronary artery disease (CAD) are not well studied, as these patients are typically underrepresented in the clinical trials.

METHODS: This cohort study involved 33,568 consecutive elective PCI cases, excluding patients with prior coronary artery bypass graft, acute coronary syndrome within 24 hr of index PCI, or shock. All data were obtained from the New York States PCI Reporting System from the calendar year 2015. In-hospital mortality was the primary outcome of study. Logistic regression models were built to calculate odds ratios (OR) with 95% confidence intervals (CI) for in-hospital mortality after adjustment for coronary anatomy and significant clinical comorbidities.

RESULTS: In this cohort of elective PCI cases all cause in-hospital mortality was low (0.3%), with a clear mortality gradient according to the extent of CAD: 0.1% in 1 vessel disease, 0.4% in 2 vessel, 0.5% in 3 vessel disease, and 3.2% in patients with LM CAD (p < .001). Mortality was also significantly increased in patients with multiple comorbidities: 0.1% in patients with 1 comorbidity, 0.7% with 2, 2.5% with 3, and 7.4% with 4 or more studied comorbidities (p < .0001). When adjusted for coronary anatomy and lesion complexity, having any 4 or more comorbidities was associated with significantly increased odds of dying after elective PCI (OR 25.9, 95% CI 8.152-82.063, p < .0001). Furthermore, when compared to patients with 3-vessel CAD, and accounted for comorbidities, the patients with LM disease still had significantly increased (OR 5.254, 95% CI 3.104-8.891, p < .0001) odds of dying after elective PCI.

CONCLUSIONS: In patients undergoing elective PCI, multivessel CAD and particularly LM disease are associated with significantly increased all-cause mortality. Furthermore, when adjusted for the extent of CAD and lesion complexity, comorbidity burden remains an important predictor of mortality.

PMID:33174681 | DOI:10.1002/ccd.29368

Read the rest here:
Coronary anatomy and comorbidities impact on elective PCI outcomes in left main and multivessel coronary artery disease - DocWire News