Summary



Association


Hit stats


Linkage Disequilibrium


Annotation


Prostate OncoArray Fine-mapping: annotation

Prostate OncoArray Fine-mapping: TCGA eQTL

ENCODE


ENCODE Histone track files

H3K27Ac Mark (Often Found Near Active Regulatory Elements) on 7 cell lines from ENCODE

Note

As GitHub has limitations on size of the repositories and files, Histone BigWig files not included in LocusExplorer/Data/EncodeBigWig/. These files are public and can be downloaded from UCSC golden path - total ~2.5GB. Downloaded bigWig files must be saved in LocusExplorer/Data/EncodeBigWig/ folder.

UCSC golden path contains the public downloadable files associated with ENCODE track.

wgEncodeBroadHistoneGm12878H3k27acStdSig.bigWig 27-Jan-2011 10:33  250M  
wgEncodeBroadHistoneH1hescH3k27acStdSig.bigWig  27-Jan-2011 11:57  651M  
wgEncodeBroadHistoneHsmmH3k27acStdSig.bigWig    27-Jan-2011 17:04  307M  
wgEncodeBroadHistoneHuvecH3k27acStdSig.bigWig   27-Jan-2011 19:55  296M  
wgEncodeBroadHistoneK562H3k27acStdSig.bigWig    27-Jan-2011 21:18  292M  
wgEncodeBroadHistoneNhekH3k27acStdSig.bigWig    28-Jan-2011 00:47  280M  
wgEncodeBroadHistoneNhlfH3k27acStdSig.bigWig    28-Jan-2011 01:51  258M

Description

Chemical modifications (e.g. methylation and acylation) to the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription. A specific modification of a specific histone protein is called a histone mark. This track shows the levels of enrichment of the H3K27Ac histone mark across the genome as determined by a ChIP-seq assay. The H3K27Ac histone mark is the acetylation of lysine 27 of the H3 histone protein, and it is thought to enhance transcription possibly by blocking the spread of the repressive histone mark H3K27Me3. Additional histone marks and other chromatin associated ChIP-seq data is available at the Broad Histone page.

Credits

This track shows data from the Bernstein Lab at the Broad Institute. The Bernstein lab is part of the ENCODE consortium.

Input File Format


Input File Specifications and Format:

  • Input files must be delimited flat text files
  • Headers are required and should use the exact column names described below and used in the example input files
  • Example files are available at: LocusExplorer/Data/CustomDataExample

1. Association File
Association File is mandatory for plot generation. All other files are optional but enhance plot aesthetics and interpretation

  • CHR - Chromosome on which variant is located preceded by “chr”. e.g chr2, chrX
  • SNP - Variant ID. e.g. rs12345, chr10:104329988:D
  • BP - Start coordinate of variant (does not include chromosome or end coordinate for in/del variants). e.g. 104356185
  • P - P-value for specified variant
  • TYPED - Use code 2 for typed and 1 for imputed variants

2. LD File
LD File is not mandatory but is recommended for more informative plots. If user supplied LD data is not available, see Make LD file tab for instructions of how LD data relative to the index SNP(s) can be obtained from the 1000 Genomes Project Phase 3 Dataset.

  • CHR_A - Chromosome on which Index SNP is located (n.b. do not include “chr”). e.g. 2, 23
  • BP_A - Index SNP start coordinate (Hg19, do not include chromosome or end coordinate for in/del variants). e.g. 104356185
  • SNP_A - Index SNP ID
  • CHR_B - Chromosome for SNP in LD with Index SNP (SNP_A)
  • BP_B - Start coordinate (Hg19, do not include chromosome or end coordinate for in/del variants) of SNP in LD with Index SNP (SNP_A). e.g. 104315667
  • SNP_B - ID of SNP in LD with Index SNP (SNP_A). e.g. rs10786679, chr10:104329988:D
  • R2 - LD score between SNP_A and SNP_B (0 to 1). e.g. 1, 0.740917

Note: Lead SNP must be defined relative to itself for plotting purposes, e.g.:

CHR_A   BP_A    SNP_A   CHR_B   BP_B    SNP_B   R2
2   173309618   rs13410475  2   173309618   rs13410475  1
2   173309618   rs13410475  2   172827293   rs148800555 0.0906124

When using plink or LDlink method this does not need to be manually added.

3. [! Disabled !] Custom bedGraph Track
Note: This feature is currently disabled, and will be available in version 0.8. See related GitHub issue.

The first four required bedGraph fields are:

chrom - The name of the chromosome (e.g. chr3, chrY).
chromStart - The starting position of the feature in the chromosome.
chromEnd - The ending position of the feature in the chromosome.
score - A score, any number.

See BedGraph Track Format for more details.

File is tab separated and has no header. This file will be used to create a bar chart. Score is the height, e.g.:

chr2    173292313   173371181   -100
chr2    173500000   173520000   1000

SNP Filters:

Use sliders to set required threshold for P-value and LD.

Zoom region:






Manhattan


LD-Heatmap


LD-Network

Note: this work is still experimental, and will be improved more in future versions.

Use PDF or SVG format for further image editing, e.g.: Photoshop, Inkscape.
Download Plot

About


An interactive graphical illustration of genetic associations and their biological context.

Disclaimer

LocusExplorer should be used for illustrative purposes only. Any results provided by LocusExplorer should be used with caution.

Availability

The source code and installation instructions for LocusExplorer are available at https://github.com/oncogenetics/LocusExplorer.

LocusExplorer is made available under the MIT license.

Required Software

LocusExplorer runs in the R environment but is designed to be an easy to use interface that does not require familiarity with R as a prerequisite. LocusExplorer is platform agnostic and able to run on any operating system for which R is available.

LocusExplorer requires R version 3.2.2 to run and can be downloaded by following the instructions at https://www.r-project.org/. Some required packages are not available for earlier versions of R.

After installation of the R software, R packages used by LocusExplorer must be installed prior to use. This may take a few minutes, but is only required on the first occasion. To install packages, open the R program, copy the following code into the R console and hit Return:

#install CRAN packages, if missing
packages <- c("shiny","dplyr","tidyr","lazyeval","data.table","ggplot2","ggrepel","knitr","markdown","DT","lattice","acepack","cluster","DBI","colourpicker","igraph","visNetwork", "devtools")
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())), dependencies = TRUE)  
} else { print("All required CRAN packages installed")}

#install Bioconductor packages if missing
source("https://bioconductor.org/biocLite.R")
bioc <- c("ggbio","GenomicRanges","TxDb.Hsapiens.UCSC.hg19.knownGene","org.Hs.eg.db","rtracklayer")
if (length(setdiff(bioc, rownames(installed.packages()))) > 0) {
  biocLite(setdiff(bioc, rownames(installed.packages())))  
} else { print("All required Bioconductor packages installed")}

#install GitHub packages:
devtools::install_github("oncogenetics/oncofunco")
  • In cases when user do not have admin rights, pop up window will prompt to set a personal library location for installation of packages, please click yes.
  • If using R GUI then user might get prompted to choose CRAN mirror to use for package downloads, please choose the city nearer to your location.
  • If prompted to “Update packages all/some/none [a/s/n]”, type “n” and hit Return.

Launch LocusExplorer

LocusExplorer runs through a web browser and uses an intuitive interface that does not require high level computational skills to operate.

1. Using runGitHub() within RStudio

Open RStudio (start a new R session) copy the following code into the console and hit Return:

library(shiny)  
runGitHub("LocusExplorer", "oncogenetics", launch.browser = TRUE)

2. Using Download as Zip (Recommended)

Click on Download as ZIP button, this will download the repisotory locally as a zip file LocusExplorer-master.zip. Unzip the folder. Open ui.R file in RStudio (start a new R session) and click on Run App (Please ensure Run External option is selected for full functionality) button at top right corner, or run below code.

library(shiny)  
runApp(launch.browser = TRUE)

Cite LocusExplorer

LocusExplorer: a user-friendly tool for integrated visualisation of genetic association data and biological annotations
Tokhir Dadaev1, Daniel A Leongamornlert1, Edward J Saunders1, Rosalind Eeles1,2 , Zsofia Kote-Jarai1

1Department of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
2Royal Marsden NHS Foundation Trust, London, UK

Bioinformatics first published online November 20, 2015 doi:10.1093/bioinformatics/btv690

Abstract

Summary: In this article we present LocusExplorer, a data visualisation and exploration tool for genetic association data. LocusExplorer is written in R using the Shiny library, providing access to powerful R-based functions through a simple user interface. LocusExplorer allows users to simultaneously display genetic, statistical and biological data for humans in a single image and allows dynamic zooming and customisation of the plot features. Publication quality plots may then be produced in a variety of file formats.
Availability and implementation: LocusExplorer is open source and runs through R and a web browser. It is available at www.oncogenetics.icr.ac.uk/LocusExplorer/ or can be installed locally and the source code accessed from https://github.com/oncogenetics/LocusExplorer.

Publications: LocusExplorer plots

Frequently asked questions

See FAQ.

Contact

Questions, suggestions, and bug reports are welcome and appreciated.

To-do List

https://github.com/oncogenetics/LocusExplorer/issues

Issue Stats Issue Stats

Input File Format


Input File Specifications and Format:

  • Input files must be delimited flat text files
  • Headers are required and should use the exact column names described below and used in the example input files
  • Example files are available at: LocusExplorer/Data/CustomDataExample

1. Association File
Association File is mandatory for plot generation. All other files are optional but enhance plot aesthetics and interpretation

  • CHR - Chromosome on which variant is located preceded by “chr”. e.g chr2, chrX
  • SNP - Variant ID. e.g. rs12345, chr10:104329988:D
  • BP - Start coordinate of variant (does not include chromosome or end coordinate for in/del variants). e.g. 104356185
  • P - P-value for specified variant
  • TYPED - Use code 2 for typed and 1 for imputed variants

2. LD File
LD File is not mandatory but is recommended for more informative plots. If user supplied LD data is not available, see Make LD file tab for instructions of how LD data relative to the index SNP(s) can be obtained from the 1000 Genomes Project Phase 3 Dataset.

  • CHR_A - Chromosome on which Index SNP is located (n.b. do not include “chr”). e.g. 2, 23
  • BP_A - Index SNP start coordinate (Hg19, do not include chromosome or end coordinate for in/del variants). e.g. 104356185
  • SNP_A - Index SNP ID
  • CHR_B - Chromosome for SNP in LD with Index SNP (SNP_A)
  • BP_B - Start coordinate (Hg19, do not include chromosome or end coordinate for in/del variants) of SNP in LD with Index SNP (SNP_A). e.g. 104315667
  • SNP_B - ID of SNP in LD with Index SNP (SNP_A). e.g. rs10786679, chr10:104329988:D
  • R2 - LD score between SNP_A and SNP_B (0 to 1). e.g. 1, 0.740917

Note: Lead SNP must be defined relative to itself for plotting purposes, e.g.:

CHR_A   BP_A    SNP_A   CHR_B   BP_B    SNP_B   R2
2   173309618   rs13410475  2   173309618   rs13410475  1
2   173309618   rs13410475  2   172827293   rs148800555 0.0906124

When using plink or LDlink method this does not need to be manually added.

3. [! Disabled !] Custom bedGraph Track
Note: This feature is currently disabled, and will be available in version 0.8. See related GitHub issue.

The first four required bedGraph fields are:

chrom - The name of the chromosome (e.g. chr3, chrY).
chromStart - The starting position of the feature in the chromosome.
chromEnd - The ending position of the feature in the chromosome.
score - A score, any number.

See BedGraph Track Format for more details.

File is tab separated and has no header. This file will be used to create a bar chart. Score is the height, e.g.:

chr2    173292313   173371181   -100
chr2    173500000   173520000   1000

R Session Info


Session info Windows

Application tested on 14/06/2018 09:56

R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2                          oncofunco_0.0.0.9000                   
 [3] rtracklayer_1.36.6                      org.Hs.eg.db_3.4.1                     
 [5] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.28.5                 
 [7] AnnotationDbi_1.38.2                    Biobase_2.36.2                         
 [9] GenomicRanges_1.28.6                    GenomeInfoDb_1.12.3                    
[11] IRanges_2.10.5                          S4Vectors_0.14.7                       
[13] ggbio_1.24.1                            BiocGenerics_0.22.1                    
[15] visNetwork_2.0.3                        igraph_1.2.1                           
[17] colourpicker_1.0                        DBI_1.0.0                              
[19] cluster_2.0.7-1                         acepack_1.4.1                          
[21] lattice_0.20-35                         DT_0.4                                 
[23] markdown_0.8                            knitr_1.20                             
[25] ggrepel_0.8.0                           ggplot2_2.2.1                          
[27] data.table_1.11.2                       lazyeval_0.2.1                         
[29] tidyr_0.8.0                             dplyr_0.7.4                            
[31] shiny_1.1.0                            

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.8.0            bitops_1.0-6                  matrixStats_0.53.1           
 [4] bit64_0.9-7                   httr_1.3.1                    RColorBrewer_1.1-2           
 [7] tools_3.4.2                   backports_1.1.2               R6_2.2.2                     
[10] rpart_4.1-11                  Hmisc_4.1-1                   colorspace_1.3-2             
[13] nnet_7.3-12                   gridExtra_2.3                 GGally_1.4.0                 
[16] curl_3.2                      bit_1.1-13                    compiler_3.4.2               
[19] graph_1.54.0                  htmlTable_1.11.2              DelayedArray_0.2.7           
[22] scales_0.5.0.9000             checkmate_1.8.5               RBGL_1.52.0                  
[25] stringr_1.3.1                 digest_0.6.15                 Rsamtools_1.28.0             
[28] foreign_0.8-69                XVector_0.16.0                base64enc_0.1-3              
[31] dichromat_2.0-0               pkgconfig_2.0.1               htmltools_0.3.6              
[34] ensembldb_2.0.4               BSgenome_1.44.2               htmlwidgets_1.2              
[37] rlang_0.2.0.9000              rstudioapi_0.7                RSQLite_2.1.1                
[40] BiocInstaller_1.26.1          bindr_0.1.1                   jsonlite_1.5                 
[43] crosstalk_1.0.0               BiocParallel_1.10.1           VariantAnnotation_1.22.3     
[46] RCurl_1.95-4.10               magrittr_1.5                  GenomeInfoDbData_0.99.0      
[49] Formula_1.2-3                 Matrix_1.2-11                 Rcpp_0.12.16                 
[52] munsell_0.4.3                 stringi_1.1.7                 yaml_2.1.19                  
[55] SummarizedExperiment_1.6.5    zlibbioc_1.22.0               AnnotationHub_2.8.3          
[58] plyr_1.8.4                    grid_3.4.2                    blob_1.1.1                   
[61] promises_1.0.1                miniUI_0.1.1                  Biostrings_2.44.2            
[64] splines_3.4.2                 pillar_1.2.2                  reshape2_1.4.3               
[67] biomaRt_2.32.1                XML_3.98-1.11                 glue_1.2.0                   
[70] biovizBase_1.24.0             latticeExtra_0.6-28           httpuv_1.4.3                 
[73] gtable_0.2.0                  purrr_0.2.4                   reshape_0.8.7                
[76] assertthat_0.2.0              mime_0.5                      xtable_1.8-2                 
[79] AnnotationFilter_1.0.0        later_0.7.2                   survival_2.41-3              
[82] OrganismDbi_1.18.1            tibble_1.4.2                  GenomicAlignments_1.12.2     
[85] memoise_1.1.0                 interactiveDisplayBase_1.14.0
Download LD file

LDlink file


Processed LDlink file


LD Tutorial


Make LD file from 1000 Genomes Data

1. Using LDlink website

  1. Go to LDlink website.
  2. Select LDproxy tab.
  3. Enter SNP rs number (must be in 1000 Genomes phase 3 dataset / dbSNP 142).
  4. Select population(s) for LD calculation and hit Calculate.
  5. Scroll down and right click Download all proxy SNPs and click Save link as… to save as a text file.
  6. Return to the Make LD file tab of LocusExplorer and upload unprocessed LDlink file as input.
  7. Download processed LD File for use at Input Data tab in LocusExplorer.

Note:

  • This procedure will calculate LD relative to one top index SNP only. For regions with multiple index SNPs, the process can be performed separately for each individual index SNP and the individual processed LD files combined to make a single input LD file.

  • LD relative to the index SNP cannot be calculated for variants that are not present in the 1000 Genomes phase 3 dataset (dbSNP 142) from publically available data and these will therefore always appear to be uncorrelated on the plot when generating LD information in this way.

2. Using tabix and plink

We will need to install tabix and plink 1.9. Download and install HTSlib package which will include tabix. Then download and install PLINK 1.9.

Then we download 1000 Genomes VCF file using tabix and calculate LD using PLINK.

Example:

2.1 Download 1000 Genomes VCF

Download vcf for region of interest 16:56995835-57017756 from 1000 genomes ftp site using tabix.

tabix -fh ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.2of4intersection.20100804.genotypes.vcf.gz 16:56995835-57017756 > genotypes.vcf
2.1 Use plink to calculate LD
2.1.1 Calcualte LD for 2 SNPs
plink --vcf genotypes.vcf --ld rs9935228 rs1864163

#output
--ld rs9935228 rs1864163:

   R-sq = 7.20831e-05    D' = 0.0355584

   Haplotype     Frequency    Expectation under LE
   ---------     ---------    --------------------
          GA      0.005359                0.004853
          AA      0.249013                0.249519
          GG      0.013719                0.014225
          AG      0.731909                0.731403

   In phase alleles are GA/AG
2.1.2 Calculate LD for list of SNP against all SNPs within 1000kb region.

We will need a list of SNPs file, one SNP per row.

# Example file snplist.txt
> cat snplist.txt
# rs9935228
# rs1864163

Now we pass snplist to plink. To learn more about plink options selected below see here and here.

plink --vcf  genotypes.vcf \
--r2 \
--ld-snp-list snplist.txt \
--ld-window-kb 1000 \
--ld-window 99999 \
--ld-window-r2 0 \
--out LD_rs9935228_rs1864163

3. Create linkage disequilibrium file, where SNP IDs are duplicated

https://www.biostars.org/p/315219/

Prostate raw data



PRACTICAL Data

Prostate summary data can be downloaded at The Prostate Cancer Association Group to Investigate Cancer Associated Alterations in the Genome (PRACTICAL) consortium website:


Contact PRACTICAL

Frequently Asked Quesitons


Q1. Can you tell me the naming convention for the genomic features?

I don't know what uc031tcg.1 is (but I do understand PCAT1, I think that naming convention is RefSeq, are the others also?).

A: I am trying to re-build gene symbols by collapsing transcripts into genes, when transcripts do not overlap with gene symbols, they get named as transcript names - in this case something like uc031tcg.1 - this is UCSC ID. See, udf_GeneSymbol for details. This part of the script is quite heavy and we are working on it.

Q2: Can you tell me how to interpret the wavy lines that cross the plot (for text in the figure legend)?

So far I am using this sentence: “The colored lines spanning the plotting region indicate the extent of LD for the lead SNPs with the same color designation, where the height of the line represents __ and the length of the line represents ___.” Can you send me a better sentence to describe how these lines should be interpreted if I am not on the right track here?

A: It is a loess smoothing for matching hit SNP. If there are 2 SNPs marked with red and green shape and fill, then we will have 2 matching loess lines red and green. Smoothing is using LD values from 1000G phase EUR subset. We can use different cut-offs of LD: LD=0, LD > 0.1, LD >= 0.2, etc., usually LD = 0, i.e.: include all SNP LDs works best.

Y axis is 0 to 1, as in minimum and maximum value for LD - R2. When wavy lines have similar shape, we can safely assume, those SNPs are the same signal. There is also an “R2” track, the darker the lines the higher the LD, this track also helps visually see how hit SNPs overlap.

See Manhattan.R.

# LD smooth per hit SNP - optional
if("LDSmooth" %in% input$ShowHideTracks) {
  gg_out <- gg_out +
    geom_smooth(data=plotDatLD(),aes(x=BP,y=R2_Adj,col=LDSmoothCol),
                method="loess",se=FALSE)}

Q3. Can you tell me what is included (maybe even just the data source) for the histone and DNase panels?

What the colour interpretation is for the histone panel?

A: Data from ENCODE project, see links for more info.

Q4: Which operating system it is tested on?

A: See RSessionInfo.md

Q5: “Please download Histone bigWig files” message on the plot?

A: As GitHub has limitations on size of the repositories and files, Histone BigWig files are not included in LocusExplorer/Data/EncodeBigWig/. These files are public and can be downloaded from UCSC golden path - total ~2.5GB. Downloaded bigWig files must be saved in LocusExplorer/Data/EncodeBigWig/ folder.

We are working on server version of LocusExplorer, expected to be live by December 2015. Keep an eye on https://github.com/oncogenetics/LocusExplorer page. This will resolve the issues of different R versions. R packages, and no limits on anntation data - such as bigWig files.

Q6: RStudio crashes when clicked on Plot Settings

After succesfully uploading data using Input Data tab, when clicked on Plot Settings RStudio crashes. This happens even when we run RGUI.

A: It is hard to guess the cause of the crash, it could be package dependencies with different versions. We can try to re-install packages.

!!! WARNING: Below steps will remove all of your installed packages !!!

Try below steps:

  1. Run .libPaths() and get library path, choose the one where the user has a write access. e.g.: On my machine I will choose the first of those two listed folders.
> .libPaths()
[1] "C:/Users/tdadaev/Documents/R/win-library/3.2" "C:/Program Files/R/R-3.2.2/library"  

myLibraryLocation <- .libPaths()[1]
  1. Make list of installed and required packages.
# all packages excluding base
allPackages <- installed.packages()
allPackages <- allPackages[ is.na(allPackages[,4]), 1]

# CRAN packages required by this Application.
cranPackages <- c("shiny", "shinyjs", "data.table", "dplyr", "tidyr", 
                  "ggplot2", "knitr", "markdown", "stringr","DT","seqminer",
                  "lattice","cluster")
# Bioconductor packages required by this Application.
bioPackages <-  c("ggbio","GenomicRanges","TxDb.Hsapiens.UCSC.hg19.knownGene",
                  "org.Hs.eg.db","rtracklayer")
  1. Remove all packages, excluding base.
# remove all packages
sapply(allPackages, remove.packages)
  1. Reinstall only required packages in myLibraryLocation folder.
#reinstall CRAN packages
sapply(cranPackages, install.packages, lib = myLibraryLocation)

#reinstall Bioconductor packages
source("http://bioconductor.org/biocLite.R")
biocLite("BiocInstaller")
biocLite(bioPackages)