Tuesday, June 28, 2011

linux解压 tar/gz/bz/gz2/bz2...压缩与解压缩

linux解压 tar命令

tar命令 tar [-cxtzjvfpPN] 文件与目录 .... 参数：
-c ：建立一个压缩文件的参数指令(create 的意思)；
-x ：解开一个压缩文件的参数指令！
-t ：查看 tarfile 里面的文件！
特别注意，在参数的下达中， c/x/t 仅能存在一个！不可同时存在！
因为不可能同时压缩与解压缩。
-z ：是否同时具有 gzip 的属性？亦即是否需要用 gzip 压缩？
-j ：是否同时具有 bzip2 的属性？亦即是否需要用 bzip2 压缩？
-v ：压缩的过程中显示文件！这个常用，但不建议用在背景执行过程！
-f ：使用档名，请留意，在 f 之后要立即接档名喔！不要再加参数！
　　　例如使用『 tar -zcvfP tfile sfile』就是错误的写法，要写成
　　　『 tar -zcvPf tfile sfile』才对喔！
-p ：使用原文件的原来属性（属性不会依据使用者而变）
-P ：可以使用绝对路径来压缩！
-N ：比后面接的日期(yyyy/mm/dd)还要新的才会被打包进新建的文件中！
--exclude FILE：在压缩的过程中，不要将 FILE 打包！
范例：范例一：将整个 /etc 目录下的文件全部打包成为 /tmp/etc.tar [root@linux ~]# tar -cvf /tmp/etc.tar /etc<==仅打包，不压缩！
[root@linux ~]# tar -zcvf /tmp/etc.tar.gz /etc<==打包后，以 gzip 压缩
[root@linux ~]# tar -jcvf /tmp/etc.tar.bz2 /etc<==打包后，以 bzip2 压缩
# 特别注意，在参数 f 之后的文件档名是自己取的，我们习惯上都用 .tar 来作为辨识。
# 如果加 z 参数，则以 .tar.gz 或 .tgz 来代表 gzip 压缩过的 tar file ～
# 如果加 j 参数，则以 .tar.bz2 来作为附档名啊～
# 上述指令在执行的时候，会显示一个警告讯息：
# 『tar: Removing leading `/" from member names』那是关於绝对路径的特殊设定。
范例二：查阅上述 /tmp/etc.tar.gz 文件内有哪些文件？
[root@linux ~]# tar -ztvf /tmp/etc.tar.gz
# 由於我们使用 gzip 压缩，所以要查阅该 tar file 内的文件时，
# 就得要加上 z 这个参数了！这很重要的！
范例三：将 /tmp/etc.tar.gz 文件解压缩在 /usr/local/src 底下
[root@linux ~]# cd /usr/local/src
[root@linux src]# tar -zxvf /tmp/etc.tar.gz
# 在预设的情况下，我们可以将压缩档在任何地方解开的！以这个范例来说，
# 我先将工作目录变换到 /usr/local/src 底下，并且解开 /tmp/etc.tar.gz ，
# 则解开的目录会在 /usr/local/src/etc 呢！另外，如果您进入 /usr/local/src/etc
# 则会发现，该目录下的文件属性与 /etc/ 可能会有所不同喔！
范例四：在 /tmp 底下，我只想要将 /tmp/etc.tar.gz 内的 etc/passwd 解开而已
[root@linux ~]# cd /tmp
[root@linux tmp]# tar -zxvf /tmp/etc.tar.gz etc/passwd
# 我可以透过 tar -ztvf 来查阅 tarfile 内的文件名称，如果单只要一个文件，
# 就可以透过这个方式来下达！注意到！ etc.tar.gz 内的根目录 / 是被拿掉了！
范例五：将 /etc/ 内的所有文件备份下来，并且保存其权限！
[root@linux ~]# tar -zxvpf /tmp/etc.tar.gz /etc
# 这个 -p 的属性是很重要的，尤其是当您要保留原本文件的属性时！
范例六：在 /home 当中，比 2005/06/01 新的文件才备份
[root@linux ~]# tar -N "2005/06/01" -zcvf home.tar.gz /home
范例七：我要备份 /home, /etc ，但不要 /home/dmtsai
[root@linux ~]# tar --exclude /home/dmtsai -zcvf myfile.tar.gz /home/* /etc
范例八：将 /etc/ 打包后直接解开在 /tmp 底下，而不产生文件！
[root@linux ~]# cd /tmp
[root@linux tmp]# tar -cvf - /etc | tar -xvf -
# 这个动作有点像是 cp -r /etc /tmp 啦～依旧是有其有用途的！
# 要注意的地方在於输出档变成 - 而输入档也变成 - ，又有一个 | 存在～
# 这分别代表 standard output, standard input 与管线命令啦！
# 这部分我们会在 Bash shell 时，再次提到这个指令跟大家再解释啰！

source: http://www.21andy.com/blog/20060820/389.html

*.gz2用gunzip2 *.gz2
For examplegunzip2 *.tar.gz2,解出一个*.tar文件，
然后tar -vxf *.tar即可

.rar格式

解压：[＊＊＊＊＊＊＊]$ rar a FileName.rar

压缩：[＊＊＊＊＊＊＊]$ rar e FileName.rar

rar请到：http://www.rarsoft.com/download.htm 下载！

解压后请将rar_static拷贝到/usr/bin目录（其他由$PATH环境变量

指定的目录也行）：[＊＊＊＊＊＊＊]$ cp rar_static /usr/bin/rar

.tar
解包：tar xvf FileName.tar
打包：tar cvf FileName.tar DirName
（注：tar是打包，不是压缩！）
———————————————
.gz
解压1：gunzip FileName.gz
解压2：gzip -d FileName.gz
压缩：gzip FileName
.tar.gz 和 .tgz
解压：tar zxvf FileName.tar.gz
压缩：tar zcvf FileName.tar.gz DirName
———————————————
.bz2
解压1：bzip2 -d FileName.bz2
解压2：bunzip2 FileName.bz2
压缩： bzip2 -z FileName
.tar.bz2
解压：tar jxvf FileName.tar.bz2
压缩：tar jcvf FileName.tar.bz2 DirName
———————————————
.bz
解压1：bzip2 -d FileName.bz
解压2：bunzip2 FileName.bz
压缩：未知
.tar.bz
解压：tar jxvf FileName.tar.bz
压缩：未知
———————————————
.Z
解压：uncompress FileName.Z
压缩：compress FileName
.tar.Z
解压：tar Zxvf FileName.tar.Z
压缩：tar Zcvf FileName.tar.Z DirName
———————————————
.zip
解压：unzip FileName.zip
压缩：zip FileName.zip DirName
———————————————
.rar
解压：rar x FileName.rar
压缩：rar a FileName.rar DirName

rar请到：http://www.rarsoft.com/download.htm 下载！
解压后请将rar_static拷贝到/usr/bin目录（其他由$PATH环境变量指定的目录也可以）：
[root@www2 tmp]# cp rar_static /usr/bin/rar
———————————————
.lha
解压：lha -e FileName.lha
压缩：lha -a FileName.lha FileName

lha请到：http://www.infor.kanazawa-it.ac.jp/~ishii/lhaunix/下载！
>解压后请将lha拷贝到/usr/bin目录（其他由$PATH环境变量指定的目录也可以）：
[root@www2 tmp]# cp lha /usr/bin/
———————————————
.rpm
解包：rpm2cpio FileName.rpm | cpio -div
———————————————
.deb
解包：ar p FileName.deb data.tar.gz | tar zxf -
———————————————
.tar .tgz .tar.gz .tar.Z .tar.bz .tar.bz2 .zip .cpio .rpm .deb .slp .arj .rar .ace .lha .lzh .lzx .lzs .arc .sda .sfx .lnx .zoo .cab .kar .cpt .pit .sit .sea
解压：sEx x FileName.*
压缩：sEx a FileName.* FileName

sEx只是调用相关程序，本身并无压缩、解压功能，请注意！
sEx请到： http://sourceforge.net/projects/sex下载！
解压后请将sEx拷贝到/usr/bin目录（其他由$PATH环境变量指定的目录也可以）：
[root@www2 tmp]# cp sEx /usr/bin/

gzip 命令

减少文件大小有两个明显的好处，一是可以减少存储空间，二是通过网络传输文件时，可以减少传输的时间。gzip 是在 Linux 系统中经常使用的一个对文件进行压缩和解压缩的命令，既方便又好用。

语法：gzip [选项] 压缩（解压缩）的文件名

该命令的各选项含义如下：

-c 将输出写到标准输出上，并保留原有文件。
-d 将压缩文件解压。
-l 对每个压缩文件，显示下列字段：
     压缩文件的大小；未压缩文件的大小；压缩比；未压缩文件的名字
-r 递归式地查找指定目录并压缩其中的所有文件或者是解压缩。
-t 测试，检查压缩文件是否完整。
-v 对每一个压缩和解压的文件，显示文件名和压缩比。
-num 用指定的数字 num 调整压缩的速度，-1 或 --fast 表示最快压缩方法（低压缩比），
-9 或--best表示最慢压缩方法（高压缩比）。系统缺省值为 6。

指令实例：

gzip *
% 把当前目录下的每个文件压缩成 .gz 文件。

gzip -dv *
% 把当前目录下每个压缩的文件解压，并列出详细的信息。

gzip -l *
% 详细显示例1中每个压缩的文件的信息，并不解压。

gzip usr.tar
% 压缩 tar 备份文件 usr.tar，此时压缩文件的扩展名为.tar.gz。

source: http://hi.baidu.com/koomo007/blog/item/4904bb2642928c09918f9d02.html

部分解压

tar部分解压，只解压出需要的文件，这样就解决了tar包过大的情况下，解压速度太慢、解压后占用空间过大的问题。

几个步骤：
查看tar包内包含的文件，如果已经知道这一步可免
tar -tzvf u2file.tar.gz
-rw-r--r-- user/user 45489156 2008-08-04 23:59:46 foder/access.log.20080804
-rw-r--r-- user/user 37469223 2008-08-05 23:59:46 foder/access.log.20080805

#解压单个文件
tar -zxvf u2file.tar.gz foder/access.log.0805

#解压多个文件
tar -zxvf u2file.tar.gz foder/access.log.*

#解压到指定目录
tar -xzvf u2file.tar.gz foder/access.log.0805 -C /new/dir/ # -C 指定解压到的目录.

source：http://angeldog.blog.51cto.com/1108908/292903

Thursday, June 23, 2011

Scatterplot with marginal histograms

The scatterplot is one of the most ubiquitous, and useful graphics. It's also very basic. One of its shortcomings is that it can hide important aspects of the marginal distributions of the two variables. To address this weakness, you can add a histogram of each margin to the plot. We demonstrate using the SF-36 MCS and PCS subscales in the HELP data set.

scatterhist = function(x, y, xlab="", ylab=""){
zones=matrix(c(2,0,1,3), ncol=2, byrow=TRUE)
layout(zones, widths=c(4/5,1/5), heights=c(1/5,4/5))
xhist = hist(x, plot=FALSE)
yhist = hist(y, plot=FALSE)
top = max(c(xhist$counts, yhist$counts))
par(mar=c(3,3,1,1))
plot(x,y)
par(mar=c(0,3,1,1))
barplot(xhist$counts, axes=FALSE, ylim=c(0, top), space=0)
par(mar=c(3,0,1,1))
barplot(yhist$counts, axes=FALSE, xlim=c(0, top), space=0, horiz=TRUE)
par(oma=c(3,3,0,0))
mtext(xlab, side=1, line=1, outer=TRUE, adj=0, at=.8 * (mean(x) - min(x))/(max(x)-min(x)))
mtext(ylab, side=2, line=1, outer=TRUE, adj=0, at=(.8 * (mean(y) - min(y))/(max(y) - min(y))))
}

ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
with(ds, scatterhist(mcs, pcs, xlab="MCS", ylab="PCS"))

SOURCE: http://www.r-bloggers.com/example-8-41-scatterplot-with-marginal-histograms/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29

An Example of ANOVA using R

In its simplest form ANOVA provides a statistical test of whether or not the means of several groups are all equal, and therefore generalizes t-test to more than two groups. The t-test tells us if the variation between two groups is "significant".

(方差分析ANOVA又称变异数分析或F检验,其目的是推断两组或多组资料的总体均数是否相同，检验两个或多个样本均数的差异是否有统计学意义)

An Example of ANOVA using R

by EV Nordheim, MK Clayton & BS Yandell, November 11, 2003

In class we handed out ”An Example of ANOVA”. Below we redo the example using R.
There are three groups with seven observations per group. We denote group i values by yi:
> y1 = c(18.2, 20.1, 17.6, 16.8, 18.8, 19.7, 19.1)
> y2 = c(17.4, 18.7, 19.1, 16.4, 15.9, 18.4, 17.7)
> y3 = c(15.2, 18.8, 17.7, 16.5, 15.9, 17.1, 16.7)

Now we combine them into one long vector, with a second vector, group, identifying group
membership:

> y = c(y1, y2, y3)
> n = rep(7, 3)
> n
[1] 7 7 7
> group = rep(1:3, n)
> group
[1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3

Here are summaries by group and for the combined data. First we show stem-leaf diagrams.

> tmp = tapply(y, group, stem)
The decimal point is at the |
16 | 8
17 | 6
18 | 28
19 | 17
20 | 1
The decimal point is at the |
15 | 9
16 | 4
17 | 47
18 | 47
19 | 1
The decimal point is at the |
15 | 29
16 | 57
17 | 17
18 | 8
> stem(y)
The decimal point is at the |
15 | 299
16 | 4578
17 | 14677
18 | 24788
19 | 117
20 | 1

Now we show summary statistics by group and overall. We locally define a temporary
function, tmpfn, to make this easier.

> tmpfn = function(x) c(sum = sum(x), mean = mean(x), var = var(x),n = length(x))
> tapply(y, group, tmpfn)
$`1`
       sum       mean        var          n
130.300000 18.614286   1.358095   7.000000
$`2`
       sum       mean        var          n
123.600000 17.657143   1.409524   7.000000
$`3`
       sum       mean        var          n
117.900000 16.842857   1.392857   7.000000
> tmpfn(y)
       sum       mean        var          n
371.800000 17.704762   1.798476 21.000000

While we could show you how to use R to mimic the computation of SS by hand, it is
more natural to go directly to the ANOVA table. See Appendix 11 for other examples of the
use of R commands for ANOVA.

> data = data.frame(y = y, group = factor(group))
> fit = lm(y ~ group, data)
> anova(fit)
Analysis of Variance Table
Response: y
          Df Sum Sq Mean Sq F value Pr(>F)
group      2 11.007 5.5033 3.9683 0.03735 *
Residuals 18 24.963 1.3868
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The anova(fit) object can be used for other computations on the handout and in class.
For instance, the tabled F values can be found by the following. First we extract the treatment
and error degrees of freedom. Then we use qt to get the tabled F values.

> df = anova(fit)[, "Df"]
> names(df) = c("trt", "err")
> alpha = c(0.05, 0.01)
> qf(alpha, df["trt"], df["err"], lower.tail = FALSE)
[1] 3.554557 6.012905

A confidence interval on the pooled variance can be computed as well using the anova(fit)
object. First we get the residual sum of squares, SSTrt, then we divide by the appropriate
chi-square tabled values.

> anova(fit)["Residuals", "Sum Sq"]
[1] 24.96286
> anova(fit)["Residuals", "Sum Sq"]/qchisq(c(0.025, 0.975), 18,lower.tail = FALSE)
[1] 0.7918086 3.0328790

Five statistical things I wished I had been taught 20 years ago

These are the pieces of hard won statistical knowledge I wish someone had taught me 20 years ago rather than my meandering, random walk approach.

1. Non parametric statistics. These are statistical tests which make a bare minimum of assumptions of underlying distributions; in biology we are rarely confident that we know the underlying distribution, and hand waving about central limit theorem can only get you so far. Wherever possible you should use a non parameteric test. This is Mann-Whitney (or Wilcoxon if you prefer) for testing "medians" (Medians is in quotes because this is not quite true. They test something which is closely related to the median) of two distributions, Spearman's Rho (rather pearson's r2) for correlation, and the Kruskal test rather than ANOVAs (though if I get this right, you can't in Kruskal do the more sophisticated nested models you can do with ANOVA). Finally, don't forget the rather wonderful Kolmogorov-Smirnov (I always think it sounds like really good vodka) test of whether two sets of observations come from the same distribution. All of these methods have a basic theme of doing things on the rank of items in a distribution, not the actual level. So - if in doubt, do things on the rank of metric, rather than the metric itself.

2. R (or I guess S). R is a cranky, odd statistical language/system with a great scientific plotting package. Its a package written mainly by statisticians for statisticians, and is rather unforgiving the first time you use it. It is defnitely worth persevering. It's basically a combination of excel spreadsheets on steriods (with no data entry. an Rdata frame is really the same logical set as a excel workbook - able to handle millions of points, not 1,000s), a statistical methods compendium (it's usually the case that statistical methods are written first in R, and you can almost guarantee that there are no bugs in the major functions - unlike many other scenarios) and a graphical data exploration tool (in particular lattice and ggplot packages). The syntax is inconsistent, the documentation sometimes wonderful, often awful and the learning curve is like the face of the Eiger. But once you've met p.adjust(), xyplot() and apply(), you can never turn back.

3. The problem of multiple testing, and how to handle it, either with the Expected value, or FDR, and the backstop of many of piece of bioinformatics - large scale permutation. Large scale permutation is sometimes frowned upon by more maths/distribution purists but often is the only way to get a sensible sense of whether something is likely "by chance" (whatever the latter phrase means - it's a very open question) given the complex, hetreogenous data we have. 10 years ago perhaps the lack of large scale compute resources meant this option was less open to people, but these days basically everyone should be working out how to appropriate permute the data to allow a good estimate of "surprisingness" of an observation.

4. The relationship between Pvalue, Effect size, and Sample size - this needs to be drilled into everyone - we're far too trigger happy quoting Pvalues, when we should often be quoting Pvalues and Effect size. Once a Pvalue is significant, it's higher significance is sort of meaningless (or rather it compounds Effect size things with Sample size things, the latter often being about relative frequency). So - if something is significantly correlated/different, then you want to know about how much of an effect this observation has. This is not just about GWAS like statistics - in genomic biology we're all too happy about quoting some small Pvalue not realising that with a million or so points often, even very small deviations will be significant. Quote your r2, Rhos or proportion of variance explained...

5. Linear models and PCA. There is a tendency often to jump to quite complex models - networks, or biologically inspired combinations, when our first instinct should be to crack out the well established lm() (linear model) for prediction and princomp() (PCA) for dimensionality reduction. These are old school techniques - and often if you want to talk about statistical fits one needs to make gaussian assumptions about distributions - but most of the things we do could be either done well in a linear model, and most of the correlation we look at could have been found with a PCA biplot. The fact that these are 1970s bits of statistics doesn't mean they don't work well.

sources: http://genomeinformatician.blogspot.com/2011/06/five-statistical-things-i-wished-i-had.html

Sunday, June 19, 2011

Side-by-side histograms

In R, the lattice package provides a similarly direct approach.
ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
ds$gender = ifelse(ds$female==1, "female", "male")
library(lattice)
histogram(~ cesd | gender, data=ds)

sources: http://sas-and-r.blogspot.com/2011/06/example-840-side-by-side-histograms.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+SASandR+%28SAS+and+R%29

Computing Odds Ratios in R

for two binary variables, x and y, each taking the values 0 and 1, the odds ratio is defined on the basis of the following four numbers:

N₀₀ = the number of data records with x = 0 and y = 0

N₀₁ = the number of data records with x = 0 and y = 1

N₁₀ = the number of data records with x = 1 and y = 0

N₁₁ = the number of data records with x = 1 and y = 1

Specifically, the odds ratio is given by the following expression:

OR = N₀₀ N₁₁ / N₀₁ N₁₀

Similarly, confidence intervals for the odds ratio are easily constructed by appealing to the asymptotic normality of log OR, which has a limiting variance given by the square root of the sum of the reciprocals of these four numbers. The R procedure oddsratioWald.proc available from the companion website for Exploring Data computes the odds ratio and the upper and lower confidence limits at a specified level alpha from these four values:

oddsratioWald.proc <- function(n00, n01, n10, n11, alpha = 0.05){

# Compute the odds ratio between two binary variables, x and y,

# as defined by the four numbers nij:

# n00 = number of cases where x = 0 and y = 0

# n01 = number of cases where x = 0 and y = 1

# n10 = number of cases where x = 1 and y = 0

# n11 = number of cases where x = 1 and y = 1

OR <- (n00 * n11)/(n01 * n10)

# Compute the Wald confidence intervals:

siglog <- sqrt((1/n00) + (1/n01) + (1/n10) + (1/n11))

zalph <- qnorm(1 - alpha/2)

logOR <- log(OR)

loglo <- logOR - zalph * siglog

loghi <- logOR + zalph * siglog

ORlo <- exp(loglo)

ORhi <- exp(loghi)

oframe <- data.frame(LowerCI = ORlo, OR = OR, UpperCI = ORhi, alpha = alpha)

oframe

}

Including “alpha = 0.05” in the parameter list fixes the default value for alpha at 0.05, which yields the 95% confidence intervals for the computed odds ratio, based on the Wald approximation described above. An important practical point is that these intervals become infinitely wide if any of the four numbers N_ij are equal to zero; also, note that in this case, the computed odds ratio is either zero or infinite. Finally, it is worth noting that if the numbers N_ij are large enough, the procedure just described can encounter numerical overflow problems (i.e., the products in either the numerator or the denominator become too large to be represented in machine arithmetic). If this is a possibility, a better alternative is to regroup the computations as follows:

OR = (N₀₀ / N₀₁) x (N₁₁ / N₁₀)

To use the routine just described, it is necessary to have the four numbers defined above, which form the basis for a two-by-two contingency table. Because contingency tables are widely used in characterizing categorical data, these numbers are easily computed in R using the table command. As a simple example, the following code reads the UCI mushroom dataset and generates the two-by-two contingency table for the EorP and GillSize attributes:

> mushrooms <- read.csv("mushroom.csv")

> table(mushrooms$EorP, mushrooms$GillSize)

b n

e 3920 288

p 1692 2224

(Note that the first line reads the csv file containing the mushroom data; for this command to work as shown, it is necessary for this file to be in the working directory. Alternatively, you can change the working directory using the setwd command.)

To facilitate the computation of odds ratios, the following preliminary procedure combines the table command with the oddsratioWald.proc procedure, allowing you to compute the odds ratio and its level-alpha confidence interval from the two-level variables directly:

TableOR.proc00 <- function(x,y,alpha=0.05){

xtab <- table(x,y)

n00 <- xtab[1,1]

n01 <- xtab[1,2]

n10 <- xtab[2,1]

n11 <- xtab[2,2]

oddsratioWald.proc(n00,n01,n10,n11,alpha)

}

The primary disadvantage of this procedure is that it doesn’t tell you which levels of the two variables are being characterized by the computed odds ratio. In fact, this characterization describes the first level of each of these variables, and the following slight modification makes this fact explicit:

TableOR.proc <- function(x,y,alpha=0.05){

xtab <- table(x,y)

n00 <- xtab[1,1]

n01 <- xtab[1,2]

n10 <- xtab[2,1]

n11 <- xtab[2,2]

outList <- vector("list",2)

outList[[1]] <- paste("Odds ratio between the level [",dimnames(xtab)[[1]][1],"] of the first variable and the level [",dimnames(xtab)[[2]][1],"] of the second variable:",sep=" ")

outList[[2]] <- oddsratioWald.proc(n00,n01,n10,n11,alpha)

outList

}

Specifically, I have used the fact that the dimension names of the 2x2 table xtab correspond to the levels of the variables x and y, and I have used the paste command to include these values in a text string displayed to the user. (I have enclosed the levels in square brackets to make them stand out from the surrounding text, particularly useful here since the levels are coded as single letters.) Applying this procedure to the mushroom characteristics EorP and GillSize yields the following results:

> TableOR.proc(mushrooms$EorP, mushrooms$GillSize)

[[1]]

[1] "Odds ratio between the level [ e ] of the first variable and the level [ b ] of the second variable:"

[[2]]

LowerCI OR UpperCI alpha

1 15.62615 17.89073 20.48349 0.05

Almost certainly, the formatting I have used here could be improved – probably a lot – but the key point is to provide a result that is reasonably complete and easy to interpret.

Finally, I noted in my last post that if we are interested in using odds ratios to compare or rank associations, it is useful to code the levels so that the computed odds ratio is larger than 1. In particular, note that applying the above procedure to characterize the relationship between edibility and the Bruises characteristic yields:

> TableOR.proc(mushrooms$EorP, mushrooms$Bruises)

[[1]]

[1] "Odds ratio between the level [ e ] of the first variable and the level [ f ] of the second variable:"

[[2]]

LowerCI OR UpperCI alpha

1 0.09014769 0.1002854 0.1115632 0.05

It is clear from these results that both Bruises and GillSize exhibit odds ratios with respect to mushroom edibility that are significantly different from the neutral value 1 (i.e., the 95% confidence interval excludes the value 1 in both cases), but it is not obvious which variable has the stronger association, based on the available data. The following procedure automatically restructures the computation so that the computed odds ratio is larger than or equal to 1, allowing us to make this comparison:

AutomaticOR.proc <- function(x,y,alpha=0.05){

xtab <- table(x,y)

n00 <- xtab[1,1]

n01 <- xtab[1,2]

n10 <- xtab[2,1]

n11 <- xtab[2,2]

rawOR <- (n00*n11)/(n01*n10)

if (rawOR < 1){

n01 <- xtab[1,1]

n00 <- xtab[1,2]

n11 <- xtab[2,1]

n10 <- xtab[2,2]

iLevel <- 2

}

else{

iLevel <- 1

}

outList <- vector("list",2)

outList[[1]] <- paste("Odds ratio between the level [",dimnames(xtab)[[1]][1],"] of the first variable and the level [",dimnames(xtab)[[2]][iLevel],"] of the second variable:",sep=" ")

outList[[2]] <- oddsratioWald.proc(n00,n01,n10,n11,alpha)

outList

}

Note that this procedure first constructs the 2x2 table on which everything is based and then computes the odds ratio in the default coding: if this value is smaller than 1, the coding of the second variable (y) is reversed. The odds ratio and its confidence interval are then computed and the levels of the variables used in computing it are presented as before. Applying this procedure to the Bruises characteristic yields the following result, from which we can see that GillSize appears to have the stronger association, as noted last time:

> AutomaticOR.proc(mushrooms$EorP, mushrooms$Bruises)

[[1]]

[1] "Odds ratio between the level [ e ] of the first variable and the level [ t ] of the second variable:"

[[2]]

LowerCI OR UpperCI alpha

1 8.963532 9.971541 11.09291 0.05

sources: http://exploringdatablog.blogspot.com/2011/05/computing-odds-ratios-in-r.html

Saturday, June 18, 2011

英语

What are you trying to say?（你到底想说什么？）
Don't be silly.（别胡闹了。）
How strong are your glasses?（你近视多少度？）
Just because.（没有别的原因。）
It isn't the way I hoped it would be.（这不是我所盼望的。）
You will never guess.（你永远猜不到。）
No one could do anything about it.（众人对此束手无措。）
I saw something deeply disturbing.（深感事情不妙。）
Money is a good servant but a bad master.（要做金钱的主人，莫做金钱的奴隶。）
I am not available.（我正忙着）
Wisdom in the mind is better than money in the hand.(脑中的知识比手中的金钱更重要）
Never say die.it's a piece of cake.别泄气，那只是小菜一碟。
Don't worry.you'll get use to it soon.别担心，很快你就会习惯的。
I konw how you feel.我明白你的感受。
You win some.you lose some.胜败乃兵家常事。
Don't bury your head in the sand.不要逃避现实。
I didn't expect you to such a good job.我没想到你干得这么好。
You are coming alone well.你做得挺顺利。
She is well-build.她的身材真棒。
You look neat and fresh.你看起来很清纯。
You have a beautiful personality.你的气质很好。
You flatter me immensely.你过奖啦。
You should be slow to judge others.你不应该随意评论别人。
I hope you will excuse me if i make any mistake.如有任何错误，请你原谅
It was most careless ofme.我太粗心了。
It was quite by accident.真是始料不及。
I wish i had all the time i'd ever wasted,so i could waste it all over again.我希望所有被我浪费的时间重新回来，让我再浪费一遍。
I like you the way you were.我喜欢你以前的样子。
You two go ahead to the movie without me,i don't want to be a third wheel.你们两个自己去看电影吧，我不想当电灯泡。
Do you have anyone in mind?你有心上人吗？
How long have you known her？你认识她多久了？
It was love at frist sight.一见钟情
I'd bettle hit the books.我要复习功课啦。
a piece of one's mind .直言不讳
He gave me a piece of mind,"Don't shift responsibility onto others."他责备道：“不要把责任推卸到别人身上。”
a cat and dog life　水火不容的生活
The husband and his wife are always quarrelling,and they are leading a cat and dog life.这对夫妇老是吵架，相互之间水火不容。
a dog's life　潦倒的生活
The man lived a dog's life.这个人生活潦倒。　
A to Z　从头至尾
I know that from A to Z. 我很了解这件事。
above somebody　深奥
Well,this sort of talk is above me.我不懂你们在讲什么。
all ears 全神贯注地倾听着
When you tell Mary some gossip,she is all ears.跟Mary讲一些小道消息，她会听地仔仔细细。
all the more　更加，益发
You'll be all the better for a holiday.度一次假，对你会更有好处。
all dressed up 打扮得整整齐齐
She is all dressed up and nowhere to go.她打扮得整整齐却无处炫耀。
all in all 总的说来；最心爱的东西
The daughter is all in all to him.女儿是他的无价宝。
all out 竭尽全力
They went all out.他们鼓足了干劲。
all over　全部结束；浑身，到处
Glad,it is all over.这事全部结束了，好得很。
I'm wet all over.我浑身都湿了。
all set　准备就绪
He is all set for an early morning start.他已做好清晨出发的一切准备。
all you have to do 需要做得是
All you have to do is to calm yourself down and wait for the good news.你需要做得是静下心来等好消息。
as easy as falling of a log /as easy as snapping your fingers /as easy as ABC 容易得很
To me,a good story teller,it would be as easy as falling of a log.
对我来说，讲个故事还不是随手拈来。
as busy as a bee 非常忙
Mum is always as busy as a bee in the moring.妈妈每天早上都忙得不可开交。
at one's fingertips　了如指掌
How to get at that little island is at his fingertips.他知道怎么去那个小岛。
at one's wit's end　智穷
Don't ask him.It is also at his wit's end.不要问他了，他也不知道。
big shot 大人物，大亨
He is a big shot in our little town.
black sheep　败家子，害群之马
Every family has a black sheep.家家有本难念的经。
black and blue　遍体鳞伤
The thief was caught of red-handed and beaten black and blue.那个小偷当场被抓住并被打得青一块紫一块的。
black and white　白纸黑字
The proof is in black and white and the murderer has no any excuses.证据确凿，凶手再也无话可说。
blind alley　死胡同
You are heading into a blind alley.你正在钻牛角尖。
blow hot and cold　摇摆不定
This guy seemed to have no own idea.He always blew hot and cold.这家伙好象没什么主张，总是摇摆不定。
blow one's own trumpet　自吹自擂
Don't blow your own tumpet.Let us see what on earth you can do.不要自吹自擂了，让我们看看你到底能做什么。
born with a silver in one's mouth　出生在富贵人家
He is born with a silver in one's mouth.他是含着金钥匙出生的。
bland new 崭新的
a bland new coat　新衣服
break the ice　打破沉默
The couple hadn't spoken to each other for a week.They were both waiting for the other one to break the ice.这对夫妇已经一个星期没说过话了。两人都在等另一方先开口。
by a blow　无意中的一击
He is beaten to the ground by a blow.他被击到在地。
can't stand it any longer　不能再忍受了
I can't stand it any longer,I quit.我再也忍受不了了，我走。
carry something too far　过火了
You are carrying your joke too far.你玩笑开得太过分了。
castle in the sky　海市蜃楼
You plan is nearly a castle in the sky.你的计划简直就是空想。
cats got one's tongue　哑口无言
chain smoker 老烟枪
come up with 产生，想出
Let me come up with some ideas.让我想一想。
come easily　容易
Languages come easily to some people.有些人能够很容易地掌握语言。
cup of tea　喜欢
Movies are not my cup of tea.我不喜欢看电影。
cut it out 停止，住嘴
Cut it out!I can't stand you any longer.
call it a day 不再做下去，停止（某种活动）
Let us call it a day,stop.这一天工作够了，停工吧！　dark horse　黑马
Nobody considered that John would win the game.He was a dark horse in the final.
dear John letter　绝交信
Jack received a dear John letter from his girlfriend because he had broken her heart.
do somone good　对某人有好处
Having some moring exercises does you good.
Do you get me?　你明白我的意思吗？
doesn't count 这次不算
It doesn't count this time,try again.
doesn't make sense　不懂；没有任何意义
The sentence you made doesn't make any sense to me.
down and out 穷困潦倒
Being down and out,he couldn't support his family.
drive at　用意，意欲
What's he driving at?他用意何在？
drop in 偶然拜访
I dropped him in on my way to the hospital.
drop me a line　写信给我
On arriving the University,please drop me a line.
early bird　早起的人
An early bird catchs worms。捷足先登
easy come easy go 来得快去得也快
eat my words 收回前言，认错道歉
I said something bad to my mum.Although I want to eat my words back, it didn't work,for I had hurt my mum's feeling.
face the music 直面困难
He knew he'd never get away with it so he decided to face the music and give himself up to the police.他知道自己不可能逃脱，因此决定一人做事一人当——向警察自首。
face up to 勇敢地面对某事
You must learn to face up to your responsibilities.
fed up　厌倦
I am rather fed up with your complaints.
feel free to do something 不要拘束
Please feel free to make suggestions.
few and far between　很少，稀少
Human beings are few and far between in this zone.
French leave　不辞而别
give me a headache 让人头痛
The naughty boy gave me a headache.
give me a hand 帮我一下
go Dutch AA制
God bless you 上帝保佑你
God bless you with your examinations.
God knows 天知道
Got it? 明白了吗？
green thumbs /fingers 园艺技能
hands are full　非常忙
have a ball　勇敢
have had it　受够了
I have had it with all your excuses.我受够了你的借口。
hold water　站得住脚
Non of his arguements seem to hold water.
in every sense of word 在某种意义上说
It's a lie in every sense of word.这是不折不扣的谎言。
keep an eye on　提高警惕
kill time打发时间
lazy bones 懒骨头
Get up lazy bones!
leave it to me　让我来吧
leave me alone 别管我　
like father like son　有其父必有其子
like it or not　不管你喜不喜欢
make a fool of oneself　愚弄某人
make a big money　赚大钱
make both ends need 收支平衡
We have to cut our expenses to make both ends need.
make waves　引起轰动；兴风作浪
His achievement made waves in his country.
make yourself at home 别拘束
no good 没有好结果
Bad mam comes to no good.
no kidding 不要开玩笑
none of your bussiness　不关你的事
not really 也不是……
old hand　老手
He is an old hand at stealing.
old story 老一套
I am tired of it,same old story.
on one's word of honor　以某人的人格担保
on occasion 间或
of one's own accord　自愿地
packed like sardins　拥挤
During the holidays,people in the trains are packed like sardins.
pass away　去世
pay the price　付出代价
You are playing with the fire and you must pay the poice one day.
put up with　忍受
I cann't put up with your rudeness any more;leave my room.
red-letter day 重要的或值得纪念的日子
red tape　繁文缛节
red carpet　红地毯
run into　偶遇
I ran into an old friend in the shop yesterday.
run out of　用尽，缺少
Quick,quick,we are running out of time.
show up 炫耀
small potatos　小人物
so what?　那怎么样呢？
stand up for 忍受
suit one's taste 对某人的胃口
sunday dress 最好的衣服
sure thing　十有把握的事
take one's time 尽情玩
Take your time and enjoy it.
take the words out of one's mouth　说出某人想说得话
that's it　就是
that is really something 太好了
there is nothing I can do　我什么都不能做
there you go 这边请
there is nothing wrong with me 我没事
under the table　死底下，秘密地
under the weather　身体不适
what's going on　怎么了
what a man　多勇敢的人啊
walking dictionary 活字典
what is up 近来可好
Hi,I haven't seen you for a long time,what's up?
world class　一流的
sources:http://blog.renren.com/share/82015165/7033023640

Friday, June 17, 2011

R, Perl and Bioinformatics Resources

R
1. Statistic on aiR: http://statistic-on-air.blogspot.com/

2. Using R: a guide for complete beginners: http://bioinformatics.knowledgeblog.org/2011/06/21/using-r-a-guide-for-complete-beginners/

3. R Programming: http://en.wikibooks.org/wiki/R_Programming

4. 5000 R questions: http://stackoverflow.com/

5. R for Data Mining: http://www.rdatamining.com/

6. K-Means Clustering on Big Data: http://blog.revolutionanalytics.com/2011/06/kmeans-big-data.html
Artificial intelligence in trading: k-means clustering: http://www.r-bloggers.com/artificial-intelligence-in-trading-k-means-clustering/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29

7. Add confidence intervals to dotchart: http://industrialengineertools.blogspot.com/2011/05/r-tutorial-add-confidence-intervals-to.html

Perl

Bioinformatics
1. The Bioinformatics Knowledgeblog: http://bioinformatics.knowledgeblog.org/

Tuesday, June 14, 2011

养胃粥——百合花生小米粥

养胃粥——百合花生小米粥
　　名字起了这么长主要是我也不知道它具体叫什么，所以就把原料都写上然后编一个名字来。
　　这个粥不是在网上看到的，是一个朋友告诉我的，因为简单所以就开始熬着喝了，呵呵~其实不是我喝的，我的胃是很棒的了，属于吃嘛嘛香那种的，是BF的胃不好给他熬着喝的。
　　这个粥主要是调理胃的，并不是用来治疗胃病的，主要对于那些胃部经常感到不舒服或者受凉不舒服，还有就是吃东西不合适了胃部难受的人适用，如果你有严重的胃病还是建议去看医生。
　　原料：米、花生仁、百合、红枣（可选）
　　做法：
　　１、淘米，先把小米淘好，我以前怕洗不干净用温水淘米的，后来在网上查了查发现用凉水的比较好，因为这样可以避免营养流失。据说用淘米水洗脸还可以去斑，这个我倒是没试过，不过它可以用来泡刚买回来的蔬菜，因为它是弱酸性的，可以中和掉蔬菜上的农药残留。看来我还是太罗嗦了，说了半天才淘了个米。晕～
　　２、百合和红枣洗净备用，红枣可以用普通红枣也可以袋装的阿胶蜜枣，最好是去了核的，这样吃起来省事，懒人应该都会选择这样的吧，呵呵～。百合嘛，我买的都是干的百合，冬季很少能买到鲜百合，并且也不好存放。
　　３、花生仁儿也要洗净备用，花生仁可以去皮也可以不去，据说那层红皮还是很有营养的，我以前一直不去皮煮粥，后来发现这样煮的粥在盛出以后颜色慢慢变红了，开始以为是放了红枣的缘故，后来才知道是因为花生仁儿的那层红皮所致，所以为了颜色好看一些现在就开始去皮了。去皮的方法很多，买来的花生仁多数不是很干燥，用手搓很难把皮去掉，我一般是把花生仁泡到热水里，差不多十分钟再去皮就很容易了。
　　４、入锅，把小米倒入砂锅，然后加水，放入洗好的百合、红枣和白白的花生仁就可以了。这些东西的量多少全看你给几个人喝而定了，我们一般两个人喝，放的都不是很多，小米一小勺、花生20粒左右、红枣6颗、百合一小嘬，水嘛有个5-6碗就差不多了，当然不是那种超级大碗。
　　５、熬粥，先用大火把水烧开，然后调到小火慢慢熬即可。熬多久这个看情况而定吧，不过我有好多次熬了粥然后自己就去玩电脑了，突然想起来的时候发现粥都只剩下底儿了，可怜的砂锅，估计在我手里一定活不长。所以说大家熬粥的时候可以适时去看看，千万不要像我这样子。
　　好了，基本做法已经说完了，我熬了有两个月了吧，BF说还不错，最近胃很少难受，不知道说的是真话还是在安慰我，呵呵~不过不管怎么说，反正挺简单的，尝试一下还是可以的嘛。呵呵~
source：http://hi.baidu.com/tonghua8/blog/item/173f464ad62b8d2009f7efd5.html

Friday, June 10, 2011

Perl and Linux

任务：将名为dir的文件夹中所有java文件中的文本“Linux”替换成“Linuxidc”：
(1).sed方案：
sed -i "s/Linux/Linuxidc/g" `grep Linux -rl /home/dir`
(2).perl方案：
perl -p -i -e "s/Linux/Linuxidc/g" *.java

本篇文章来源于 Linux公社网站(http://www.linuxidc.com/) 原文链接：http://www.linuxidc.com/Linux/2008-02/10954.htm

File::Copy

use File::Copy;
copy("file1","file2") or die "Copy failed: $!";
copy("Copy.pm",\*STDOUT);
move("/dev1/fileA","/dev2/fileB");

The copy function takes two parameters: a file to copy from and a file to copy to.

The move function also takes two parameters: the current name and the intended name of the file to be moved. If the destination already exists and is a directory, and the source is not a directory, then the source file will be renamed into the directory specified by the destination.

If possible, move() will simply rename the file. Otherwise, it copies the file to the new location and deletes the original.
sources: http://perldoc.perl.org/File/Copy.html

Sunday, June 5, 2011

衣服上的各种脏东西怎样才能洗掉？

1、洗衣服上的红墨水渍：新渍先用冷水洗，再用温肥皂液浸泡一会儿，再用清水漂洗；陈渍可先用洗涤剂洗，再用10%的酒精溶液搓擦即可祛除。
　　2、墨渍：可用饭粒和洗涤剂调匀，涂在污渍部分搓擦，再用清水漂洗干净；也可用一份酒精、二份肥皂制的溶液反复涂擦，亦有良好效果。
　　3、圆珠笔油渍：首先要看看衣服是什么料子，一般做法是在污渍处下面放一块毛巾，用小鬃刷沾上酒精顺丝轻轻刷洗待污渍溶解扩散后，再把衣服泡在冷水中，抹上肥皂轻轻刷洗，这样反复两三次，就能基本除去圆珠笔油。如果洗后还留有少量残迹，可再用热肥皂水浸泡或煮沸就可以除去，对棉和棉涤织品可以采用这种方法。如果毛料装沾上圆珠笔油，可先把污渍处放到三氯乙烯和酒精（比例是二比三）的混合溶液中浸泡10分钟，同时不断用毛刷轻轻刷一刷，待大部分油渍溶解后，再用低温肥皂水或中性洗衣粉洗净。
　　4、霉斑：用2%的肥皂酒精溶液擦拭，然后用漂白剂3%-5%的次X钠或用双氧水擦拭，最后再洗涤。
　　5、汗渍：衣服上沾上了汗水，时间一长容易出现黄斑，有了汗渍可把衣服放在5%食盐水中浸泡1小时，再慢慢搓干净。
　　6、羊毛衫受污染：用开水将中性肥皂液体溶解并加一汤匙硼砂，等温度降至60度把羊毛衫放入皂液浸泡三四小时，再在40-50度的温水中轻轻揉洗，最后用温水漂洗。如还有洗不净的污染处，可用皂液加两汤匙松节油，调成乳状洗涤，洗净后的毛线衫任其自然干燥，至大半干时可用不太烫的熨斗隔一层布将其烫干熨平，再略加晾晒即可。
　　7、血渍：刚染上的血渍可先用冷清水浸泡几分钟，然后用肥皂或酒精洗涤。如是陈迹可用柠檬汁加盐水除掉，也可用白萝卜揩擦，但切忌用热水洗。也可以用清水将血渍洗至浅棕色后，再用甘油皂洗涤，最后在温水中漂洗干净。
　　8、烟筒油渍：衣服上染上烟筒油渍要立即用汽油擦洗（从污迹外部往里擦洗，避免污迹扩大）或用温皂液洗涤。
　　9、洗衣服上的腊纸、复写纸色迹：先用洗衣粉洗，再用汽油洗，最后用酒精擦。
　　10、咖啡迹：不太浓的咖啡污迹可用肥皂或洗衣粉浸入热水中清洗干净；较浓的咖啡则需在鸡蛋黄内洒入少许甘油，混合后涂抹在污迹处，待稍干后再用肥皂及热水清洗咖啡污迹可清除干净。
　　11、洗衣服及台布上的鸡蛋迹：衣服，尤其是饭桌的台布上常有鸡蛋的污迹，先把台布浸在冷水里，浸透后用棉布或棉布蘸少许食盐擦拭，最后用温水清洗。
　　12、口红迹：染在浅色服饰上的口红，可先浸透汽油，然后再用肥皂水擦洗便可洗净。
　　13、口香糖污迹：沾在衣物、墙壁或其他物品上的口香糖污迹，可先用棉花或布巾浸上白醋，再用其擦洗污迹处，可擦洗干净。
　　14、桐油渍：先用汽油将桐油渍浸软，再用豆腐渣擦洗，可除净。
　　15、膏药：用温白酒洗除。
　　16、碘酒污：用面粉洗除。
　　17、红汞污：用醋洗除。
　　18, 醋渍：衣物上沾上了醋迹或酱油迹，可撒上少许白砂糖搓揉，再用温水洗净。
　　
　　补充：
　　
　　油渍　　
　　①衣服上的油渍可用松香水、香蕉水，汽油等来擦洗，然后放人3％的盐水里浸几分钟，再用清水漂洗。
　　②丝绸饰品如果沾上油渍，可用丙酮溶液轻轻搓洗即可。
　　③深色衣服上的油渍，用残茶叶搓洗能去污。
　　④少许牙膏拌上洗衣粉混合搓洗衣服上的油污，油渍可除。
　　⑤取少许面粉，调成糊状，涂在衣服的油渍正反面，在太阳下晒干，揭去面壳，即可清除油渍。
　　
　　油漆渍　　
　　①衣服沾上油漆、喷漆污渍，可在刚沾上漆渍的衣服正反面涂上清凉油少许，隔几分钟，用棉花球顺衣料的经纬纹路擦几下，漆渍便消除。1日漆渍也可用此法除去，只要略微涂些清凉油，漆皮就会自行起皱，即可剥下，再将衣服洗一遍，漆渍便会荡然无存。
　　②新渍可用松节油或香蕉水揩试污渍处，然后用汽油擦洗即可。陈渍可将污渍处浸在10—20％的氨水或硼砂溶液中，使凝固物溶解并刷擦干净。
　　
　　酒渍可以用煮沸的牛奶擦试，非常有效。其中以白棉麻质衣物最有效。
　　
　　墨渍、唇膏渍水洗后用洗涤剂洗，或用糯米饭与洗涤剂调匀，也可用牙膏涂在污渍部分揉搓，然后用水漂洗。如果是丝绸料，要将污渍面向下平铺在干净的纸上，涂上干洗剂或酒精，揉搓丝织品污迹背面，直至污迹消失，然后洗涤、漂洗。
　　　　
　　霉斑　　
　　①衣服出现霉点，可用少许绿豆芽在霉点处揉搓，然后用清水漂洗，霉点即可去除。
　　②新霉斑先用软刷刷干净，再用酒精洗除。陈霉斑先涂上淡氨水，放置一会儿，再涂上高锰酸钾溶液，最后用亚硫酸氢钠溶液处理和水洗。
　　③皮革衣服上有霉斑时，用毛巾蘸些肥皂水措擦，去掉污垢后立即用清水洗干净，待晾干后再涂甲克油即可。
　　④白色丝绸衣服上的霉斑，可用5％的白酒擦洗，除霉效果很好。
　　⑤丝绸衣服出现霉斑，一般可以在水中用软刷刷洗，若霉斑较重，可在霉斑的地方涂上5％谈盐水，放置3—5分钟，再用清水漂洗即可。
　　
　　衣物上粘有口香糖难以除去，若将衣服放置冰箱一段时间，口香糖经冷冻变脆，用刀片就容易刮掉了。
　　
　　染色衣物经过洗涤，往往会发生褪色现象，如果将衣服洗净后，再在加有两杯啤酒的清水中漂洗，褪色部位即可复色。
　　
　　白衣服穿久了常常会变黄，可以把它浸泡在加有蓝靛的溶液里漂洗，结果会洁白如新。

source: http://wenwen.soso.com/z/q2006577134.htm?pid=mail.wen8