Title: | Consensus Clustering for Different Sample Coverage Data |
---|---|
Description: | Consensus clustering, also called meta-clustering or cluster ensembles, has been increasingly used in clinical data. Current consensus clustering methods tend to ensemble a number of different clusters from mathematical replicates with similar sample coverage. As the fact of common variety of sample coverage in the real-world data, a new consensus clustering strategy dealing with such biological replicates is required. This is a two-step consensus clustering package, which is used to input multiple predictive labels with different sample coverage (missing labels). |
Authors: | Chuanxing Li [aut, cre], Meng Zhou [aut] |
Maintainer: | Chuanxing Li <[email protected]> |
License: | GPL-2 |
Version: | 1.4.0 |
Built: | 2024-11-12 06:13:32 UTC |
Source: | https://github.com/pulmonomics-lab/ccml |
Calculate normalized consensus weight(NCW) matrix based on permutation.
callNCW( title = "", label, nperm = 10, ncore = 1, seedn = 100, stability = TRUE, plot = NULL )
callNCW( title = "", label, nperm = 10, ncore = 1, seedn = 100, stability = TRUE, plot = NULL )
title |
A character value for output directory. Directory is created only if not existed. This title can be an abosulte or relative path. |
label |
A matrix or data frame of input labels, columns=different clustering results and rows are samples. |
nperm |
A integer value of the permutation numbers, or nperm=10(default), which means |
ncore |
A integer value of cores to use, or ncore=1 (default). It's the input core numbers for the parallel computation in this package |
seedn |
A numerical value to set the start random seed for reproducible results, or seedn=100 (default). For every 1000 iteration, the seed will +1 to get repeat results. |
stability |
A logical value. Should estimate the stability of normalized consensus weight based on permutation numbers (default stability=TRUE), or not? |
plot |
character value. NULL(default) - print to screen, 'pdf', 'png', 'pngBMP' for bitmap png, helpful for large datasets, or 'pdf'. Input for |
A matrix of normalized consensus weights.
# load data data(example_data) label=example_data # if plot is not NULL, results will be saved in "result_output" directory title="result_output" # run ncw ncw<-callNCW(title=title,label=label,stability=TRUE,nperm=4,ncore=1)
# load data data(example_data) label=example_data # if plot is not NULL, results will be saved in "result_output" directory title="result_output" # run ncw ncw<-callNCW(title=title,label=label,stability=TRUE,nperm=4,ncore=1)
A two-step consensus clustering inputing multiple predictive labels with different sample coverages (missing labels)
ccml( title, label, output = "rdata", nperm = 10, ncore = 1, seedn = 100, stability = TRUE, maxK = 15, reps = 1000, pItem = 0.9, plot = NULL, clusterAlg = "spectralClusteringAffinity", innerLinkage = "complete", ... )
ccml( title, label, output = "rdata", nperm = 10, ncore = 1, seedn = 100, stability = TRUE, maxK = 15, reps = 1000, pItem = 0.9, plot = NULL, clusterAlg = "spectralClusteringAffinity", innerLinkage = "complete", ... )
title |
A character value for output directory. Directory is created only if not existed. This title can be an abosulte or relative path. Input for |
label |
A matrix or data frame of input labels or a character value of input file name, columns=different clustering results and rows are samples. |
output |
A character value for output format, or "rdata"(default) as save to .rdata when both output and plot are not NULL, others will return to workspace. |
nperm |
A integer value of the permutation numbers, or nperm=10(default), which means |
ncore |
A integer value of cores to use, or ncore=1 (default). It's the input core numbers for the parallel computation in this package |
seedn |
A numerical value to set the start random seed for reproducible results, or seedn=100 (default). For every 1000 iteration, the seed will +1 to get repeat results. Input for |
stability |
A logical value. Should estimate the stability of normalized consensus weight based on permutation numbers (default stability=TRUE), or not? Input for |
maxK |
integer value. maximum cluster number to evaluate. Input for |
reps |
integer value. number of subsamples. Input for |
pItem |
numerical value. proportion of items to sample. Input for |
plot |
character value. NULL(default) - print to screen, 'pdf', 'png', 'pngBMP' for bitmap png, helpful for large datasets. Input for |
clusterAlg |
character value. cluster algorithm. 'spectralClusteringAffinity' for spectral clustering of similarity/affinity matrix(default), other methods for clustering of distance matrix, 'hc' heirarchical (hclust), 'pam' for paritioning around medoids,
'km' for k-means upon data matrix, 'kmdist' for k-means upon distance matrices (former km option), or a function that returns a clustering. Input for |
innerLinkage |
heirarchical linkage method for subsampling, or "complete"(default). Input for |
... |
Other input arguments for |
A list of three items
ncw - A matrix of normalized consensus weights. Output from callNCW
.
fcluster - A list of length maxK. Each element is a list containing consensusMatrix (numerical matrix), consensusTree (hclust), consensusClass (consensus class asssignments). ConsensusClusterPlus also produces images. Output from ConsensusClusterPlus::ConsensusClusterPlus
icl a list of two elements clusterConsensus and itemConsensus corresponding to cluster-consensus and item-consensus. Output from ConsensusClusterPlus::ConsensusClusterPlus
# load data data(example_data) label=example_data # if plot is not NULL, results will be saved in "result_output" directory title="result_output" # not estimate stability of permutation numbers. res_1=ccml(title=title,label=label,nperm = 3,ncore=1,stability=FALSE,maxK=5,pItem=0.8) # other methods for clustering of distance matrix res_2<-ccml(title=title,label=label,nperm = 10,ncore=1,stability=TRUE,maxK=3, pItem=0.9,clusterAlg = "hc") # set the start random seed res_3<-ccml(title=title,label=label,output=FALSE,nperm = 5,ncore=1,seedn=150,stability=TRUE,maxK=3, pItem=0.9)
# load data data(example_data) label=example_data # if plot is not NULL, results will be saved in "result_output" directory title="result_output" # not estimate stability of permutation numbers. res_1=ccml(title=title,label=label,nperm = 3,ncore=1,stability=FALSE,maxK=5,pItem=0.8) # other methods for clustering of distance matrix res_2<-ccml(title=title,label=label,nperm = 10,ncore=1,stability=TRUE,maxK=3, pItem=0.9,clusterAlg = "hc") # set the start random seed res_3<-ccml(title=title,label=label,output=FALSE,nperm = 5,ncore=1,seedn=150,stability=TRUE,maxK=3, pItem=0.9)
In this matrix, columns represent the results of different clustering results and rows are samples.
example_data
example_data
A matrix with 10 rows and 5 columns.
Plot of original consensus weights vs. normalized consensus weights grouping by the number of co-appeared percent of clustering(non-missing).
plotCompareCW(title, label, ncw, plot = NULL)
plotCompareCW(title, label, ncw, plot = NULL)
title |
A character value for output directory. |
label |
A matrix or data frame of input labels, columns=different clustering results and rows are samples. |
ncw |
A matrix of normalized consensus weights with sample-by-sample as the order of sample(rows) in |
plot |
character value. NULL(default) - print to screen, 'pdf', 'png', 'pngBMP' for bitmap png, helpful for large datasets, or 'pdf'. |
A ggplot point in PDF format with x-axis: original consensus weights; y-axis: normalized consensus weights; color: percent of co-appeared of clustering; size: number of duplicates sample .
# load data data(example_data) label=example_data # if plot is not NULL, results will be saved in "result_output" directory title="result_output" ncw<-callNCW(title=title,label=label,stability=TRUE) plotCompareCW(title=title,label=label,ncw=ncw)
# load data data(example_data) label=example_data # if plot is not NULL, results will be saved in "result_output" directory title="result_output" ncw<-callNCW(title=title,label=label,stability=TRUE) plotCompareCW(title=title,label=label,ncw=ncw)
callNCW
Calculate consensus weight matrix based on the permuted input label matrix. Internal function used by callNCW
randConsensusMatrix( l.seed, l.label = label, l.ns = ns, l.nc = nc, l.nv = nv, l.index = index, l.pair.ind = pair.ind, l.ppath = ppath, l.plot = plot )
randConsensusMatrix( l.seed, l.label = label, l.ns = ns, l.nc = nc, l.nv = nv, l.index = index, l.pair.ind = pair.ind, l.ppath = ppath, l.plot = plot )
l.seed |
A numerical value to set the random seed for reproducible results, 1000 random label matrix will be generated based on this seed number. |
l.label |
A matrix or data frame of input labels, columns=different clustering results and rows are samples. |
l.ns |
A integer value of number of samples, = |
l.nc |
A integer value of number of samples, = |
l.nv |
A integer vector of the number of non missing values for each column in |
l.index |
A list of index with length of |
l.pair.ind |
A n-by-2 index matrix of array indices of upper triangular of |
l.ppath |
A character value for output directory. |
l.plot |
character value. NULL(default) - print to screen, 'pdf', 'png', 'pngBMP' for bitmap png, helpful for large datasets, or 'pdf'. |
A character of finished seed.
Write a binary file of 1000 random consensus weight matrix(as a vector n-by-1, n= nrow(l.pair.ind
)) with the seed l.seed
, output file name: paste0("s",l.seed
,"rcw").
Perform spectral clustering algorithms for an affinity matrix, using SNFtool::spectralClustering.
spectralClusteringAffinity(affi_matrix, k, type = 3)
spectralClusteringAffinity(affi_matrix, k, type = 3)
affi_matrix |
A numerical similarity or affinity matrix. |
k |
A number value of clusters |
type |
The variants of spectral clustering to use. See |
A vector consisting of cluster labels of each sample.