热门帖子

显示标签为“运筹学,博士,研究,phd”的博文。显示所有博文
显示标签为“运筹学,博士,研究,phd”的博文。显示所有博文

2015年9月21日星期一

Decision Tree using R -- 用R 实现决策树

Tree-Based Models

Recursive partitioning is a fundamental tool in data mining. It helps us explore the stucture of a set of data, while developing easy to visualize decision rules for predicting a categorical (classification tree) or continuous (regression tree) outcome. This section briefly describes CART modeling, conditional inference trees, and random forests.

CART Modeling via rpart

Classification and regression trees (as described by Brieman, Freidman, Olshen, and Stone) can be generated through the rpart package. Detailed information on rpart is available in An Introduction to Recursive Partitioning Using the RPART Routines. The general steps are provided below followed by two examples.

1. Grow the Tree

To grow a tree, use
rpart(formula, data=, method=,control=) where

formula is in the format
outcome
~ predictor1+predictor2+predictor3+ect.
data= specifies the data frame
method= "class" for a classification tree
"anova"
for a regression tree
control= optional parameters for controlling tree growth. For example, control=rpart.control(minsplit=30, cp=0.001) requires that the minimum number of observations in a node be 30 before attempting a split and that a split must decrease the overall lack of fit by a factor of 0.001 (cost complexity factor) before being attempted.

2. Examine the results

The following functions help us to examine the results.
printcp(fit) display cp table
plotcp(fit) plot cross-validation results
rsq.rpart(fit) plot approximate R-squared and relative error for different splits (2 plots). labels are only appropriate for the "anova" method.
print(fit) print results
summary(fit) detailed results including surrogate splits
plot(fit) plot decision tree
text(fit) label the decision tree plot
post(fit, file=) create postscript plot of decision tree
In trees created by rpart( ), move to the LEFT branch when the stated condition is true (see the graphs below).

3. prune tree

Prune back the tree to avoid overfitting the data. Typically, you will want to select a tree size that minimizes the cross-validated error, the xerror column printed by printcp( ).
Prune the tree to the desired size using
prune(fit, cp= )
Specifically, use printcp( ) to examine the cross-validated error results, select the complexity parameter associated with minimum error, and place it into the prune( ) function. Alternatively, you can use the code fragment
     fit$cptable[which.min(fit$cptable[,"xerror"]),"CP"]
to automatically select the complexity parameter associated with the smallest cross-validated error. Thanks to HSAUR for this idea.


Classification Tree example

Let's use the data frame kyphosis to predict a type of deformation (kyphosis) after surgery, from age in months (Age), number of vertebrae involved (Number), and the highest vertebrae operated on (Start).
# Classification Tree with rpart
library(rpart)

# grow tree
fit <- rpart(Kyphosis ~ Age + Number + Start,
   method="class", data=kyphosis)

printcp(fit) # display the results
plotcp(fit) # visualize cross-validation results
summary(fit) # detailed summary of splits

# plot tree
plot(fit, uniform=TRUE,
   main="Classification Tree for Kyphosis")
text(fit, use.n=TRUE, all=TRUE, cex=.8)

# create attractive postscript plot of tree
post(fit, file = "c:/tree.ps",
   title = "Classification Tree for Kyphosis")

cp Plot Classification Tree Classification Tree in Postscript click to view
# prune the tree
pfit<- prune(fit, cp=   fit$cptable[which.min(fit$cptable[,"xerror"]),"CP"])

# plot the pruned tree
plot(pfit, uniform=TRUE,
   main="Pruned Classification Tree for Kyphosis")
text(pfit, use.n=TRUE, all=TRUE, cex=.8)
post(pfit, file = "c:/ptree.ps",
   title = "Pruned Classification Tree for Kyphosis")

Pruned Classificaiton Tree Pruned Classification Tree in Postscript click to view

Regression Tree example

In this example we will predict car mileage from price, country, reliability, and car type. The data frame is cu.summary.
# Regression Tree Example
library(rpart)

# grow tree
fit <- rpart(Mileage~Price + Country + Reliability + Type,
   method="anova", data=cu.summary)

printcp(fit) # display the results
plotcp(fit) # visualize cross-validation results
summary(fit) # detailed summary of splits

# create additional plots
par(mfrow=c(1,2)) # two plots on one page
rsq.rpart(fit) # visualize cross-validation results  

# plot tree
plot(fit, uniform=TRUE,
   main="Regression Tree for Mileage ")
text(fit, use.n=TRUE, all=TRUE, cex=.8)

# create attractive postcript plot of tree
post(fit, file = "c:/tree2.ps",
   title = "Regression Tree for Mileage ")

cp plot for regression tree rsquare plot for regression treeregression tree Regressio Tree in Post Script click to view
# prune the tree
pfit<- prune(fit, cp=0.01160389) # from cptable

# plot the pruned tree
plot(pfit, uniform=TRUE,
   main="Pruned Regression Tree for Mileage")
text(pfit, use.n=TRUE, all=TRUE, cex=.8)
post(pfit, file = "c:/ptree2.ps",
   title = "Pruned Regression Tree for Mileage")

It turns out that this produces the same tree as the original.

Conditional inference trees via party

The party package provides nonparametric regression trees for nominal, ordinal, numeric, censored, and multivariate responses. party: A laboratory for recursive partitioning, provides details.
You can create a regression or classification tree via the function
ctree(formula, data=)
The type of tree created will depend on the outcome variable (nominal factor, ordered factor, numeric, etc.). Tree growth is based on statistical stopping rules, so pruning should not be required.
The previous two examples are re-analyzed below.
# Conditional Inference Tree for Kyphosis
library(party)
fit <- ctree(Kyphosis ~ Age + Number + Start,
   data=kyphosis)
plot(fit, main="Conditional Inference Tree for Kyphosis")

Condiitional Inference Tree for Kyphosis click to view
# Conditional Inference Tree for Mileage
library(party)
fit2 <- ctree(Mileage~Price + Country + Reliability + Type,
   data=na.omit(cu.summary))

Conditional Inference Tree for Mileage click to view

Random Forests

Random forests improve predictive accuracy by generating a large number of bootstrapped trees (based on random samples of variables), classifying a case using each tree in this new "forest", and deciding a final predicted outcome by combining the results across all of the trees (an average in regression, a majority vote in classification). Breiman and Cutler's random forest approach is implimented via the randomForest package.
Here is an example.
# Random Forest prediction of Kyphosis data
library(randomForest)
fit <- randomForest(Kyphosis ~ Age + Number + Start,   data=kyphosis)
print(fit) # view results
importance(fit) # importance of each predictor

For more details see the comprehensive Random Forest website.

2014年4月15日星期二

C++ Cplex 定义三维变量矩阵,Define 3 dimensional Matrix

刚学的时候有各种问题,比如这次遇到的如何定义三维矩阵,找遍了GOOGLE居然都找不到一个答案,Manual上说的更是模糊,因此在这里我要详细说明一下。

首先要define你的三维矩阵,因为三维要用到二维,因此也需要定义二维矩阵。
typedef IloArray<IloIntVarArray> IntVarMatrix2; //定义二维矩阵
typedef IloArray<IloArray<IloIntVarArray> > IntVarMatrix3; //定义三维矩阵

然后定义你的三维变量,这里我的三维变量形式为V[i][j][k],其中i=1..nbnodes, j =1..nbnodes, k=1..nblines;

IntVarMatrix3 V(env,nbnodes); //定义第一维的长度
for(i = 0; i < nbnodes; i++){
V[i] = IntVarMatrix2(env, nbnodes); //定义第二维的长度
for(j = 0; j < nbnodes; j++){
V[i][j] = IloIntVarArray(env, nblines, 0, RAND_MAX); //定义第三维的长度,也就是每个V[i][j][k]的范围是0到无穷大,之前我用IloInfinity不知道为什么编译后求解x的上限为一个负的大数,因此我就改为了RAND_MAX.
}
}

2014年4月10日星期四

如何用c++ cplex定义数组和矩阵

刚接触c++,cplex一定会对定义数组和矩阵比较迷茫,下面就几个例子详细说明一下如何对这些进行定义。
我就拿例子来说明吧,

typedef IloArray<IloNumArray>    NumMatrix;     // 首先要在最前面define type,其中<IloNumArray>代表的是cplex里的一个数组,加上括号和外面的IloArray这样就定义了一个二维的矩阵,同理再在外面继续加可以定义三维,多维等;这里定义的NumMatrix是参数的矩阵。
typedef IloArray<IloNumVarArray> NumVarMatrix;  // 这里是定义存储变量的二维矩阵。

...中间代码省略

   IloEnv env;
   try {
      IloInt i, j;
      IloModel model(env);

      IloInt nbDemand = 4;
      IloInt nbSupply = 3;
      IloNumArray supply(env, nbSupply, 1000.0, 850.0, 1250.0);  //对于一维数组来说,因为cplex的IloNumArray就是已经定义好的数组,这里就可以直接拿来用
      IloNumArray demand(env, nbDemand, 900.0, 1200.0, 600.0, 400.);

      NumVarMatrix x(env, nbSupply); //前面定义过的变量矩阵,x是其的一个对象,范围是nbSupply
      NumVarMatrix y(env, nbSupply);

      for(i = 0; i < nbSupply; i++) {
         x[i] = IloNumVarArray(env, nbDemand, 0.0, IloInfinity, ILOFLOAT);
         y[i] = IloNumVarArray(env, nbDemand, 0.0, IloInfinity, ILOFLOAT);
      } //这里是对x和y矩阵的每一个位置的变量进行定义,即xij,i从1到nbSupply,j从1到nbDemand,xij的值大于等于0小于无穷,属于浮点型,即连续变量

2014年4月9日星期三

成功调试第一个CPLEX程序

其实不能说是程序,因为是安装ILOG自带的例子,对于初学者来说最头疼的就属于面对这么一个复杂程序如何才能快速掌握。在这里我主要讲解以下两点:
一、build and run CPLEX里面自带的例子
二、创建c++项目并连接CPLEX

Building and Running CPLEX Examples

The C and C++ CPLEX examples have all been gathered in one project for each type of static format (mta and mda). The instructions below use the mta format for the Visual Studio 2008 environment, but similar instructions apply when using the project file for another format or with Visual Studio 2010. The related file for the mda format is <CPLEXDIR>\examples\x86_windows_vs2008\stat_mda\examples.sln
(这里的CPLEXDIR指你的CPLEX安装路径,比如我的就是“C:\Program Files (x86)\IBM\ILOG\CPLEX_Studio125\cplex”,然后加上后面的\examples\...就可以了
Be aware that the order of the instructions below is important.
  1. Start Microsoft Visual Studio 2008.

  2. From the File menu, choose Open Project/Solution.
    The Open Project dialog box appears.
    • Select the folder <CPLEXDIR>\examples\x86_windows_vs2008\stat_mta.
    • Select the examples.sln file and click Open.

  3. To build only one example (for instance, blend):
    • Select the blend project in the Solution Explorer window.
    • From the Build menu, choose Build blend.
      Wait for the completion of the building process.
  4. To build all of the examples:
    • From the Build menu, choose Build Solution
      Wait for the completion of the building process.

  5. To run an example (for instance, blend):
    • Open a command prompt window by running the Visual Studio 2008 Command Prompt.
      In the window Visual Studio 2008 Command prompt:
      • Type set path=%path%;<CPLEXDIR>\bin\x86_win32 so that cplex123.dll is on the path. (这条写的太坑爹,让我试了几十次终于才成功,如果你的CPLEX和我一样安装在默认目录即C:\Program Files x86\...那么就有可能遇到我的问题。这里设置路径的正确指令应该是:set path=C:\Program Files (x86)\IBM\ILOG\CPLEX_Studio125\cplex\bin\x86_win32; 这里面的CPLEX_Studio125指我安装的版本是12.5,如果你的不是正式版的可能不一样,还有我安装的32位因此在Program Files (x86)目录下,如果64位可能会是Program Files目录下)
      • Type <CPLEXDIR>\examples\x86_windows_vs2008\stat_mta\blend. (这里也比较坑爹,如果你和我一样安装在默认目录即C:\Program Files (x86)\下,很可能输入这条指令会显示“‘C:\Program' 不是内部或外部命令,也不是可运行的程序或批处理文件”的错误,那是因为这里系统读到空格处即Program后面就自动断点了,因此,你需要把你的路径用双引号引起来,如"C:\Program Files (x86)\IBM\ILOG\CPLEX_Studio125\cplex\bin\x86_windows_vs2010\stat_mta\blend". 这里x86_windows_vs2010是指你安装的c++版本号,我的是2010.)
      • The result is then displayed. The setting of the path environment variable is only necessary if the this folder is not already on the path. The default installer action is to modify the path to include this folder。
注意了上面的问题,你就可以执行任何自带的例子然后看到结果了。


Building Your Own Project which Links with CPLEX

Note:
The information below applies to the Visual C++ 2008 multi-threaded STL library. If you use another version of the library, set the Runtime Library option to match the library version. If you use Visual Studio 2010, the instructions below should apply, except that x86_windows_vs2008 should be replaced with x86_windows_vs2010 whenever a path name is specified.
Let's assume that you want to build a target named test.exe and have:
  • a source file named test.cpp which uses Concert Technology or test.c which uses the C API of the CPLEX Callable Library;
  • a folder where this file is located and which, for the sake of simplicity, we'll refer to as <MYAPPDIR>.
One way to achieve that is to create a project named test.vcproj as described here. Be aware that the order of instructions is important. Note that project files for VS2010 have the extension vcxproj.
  1. Start Microsoft Visual Studio 2008. 

  2. The first step is to build the test.sln solution.
    From the File menu, select New->, and then Project....
    The New Project dialog box appears.
    • In the Project types pane, select Visual C++ and Win32.
    • In the Templates pane, select the Win32 Console Application icon.
    • Fill in the project name (test).
    • If necessary, correct the location of the project (to <MYAPPDIR>)
    • Click OK
  3. When the Win32 Application Wizard appears...
    • Click on Application Settings.
    • Select Console Application as Application type.
    • Make sure that Empty project is checked in Additional Options.
    • Click Finish.
    This creates a solution, test, with a single project, test. You can see the contents of the solution by selecting Solution Explorer in the View menu.
  4. Now you must add your source file to the project. From the Project menu, choose Add Existing Item...
    • Move to the folder <MYAPPDIR> and select test.cpp or test.c.
    • Click Open.

  5. Next, you have to set some options so that the project knows where to find the CPLEX and Concert include files and the CPLEX and Concert libraries.
    From the Project menu, choose Properties.
    The test Property Pages dialog box appears.
    In the Configuration drop-down list, select Release.
    Select C/C++ in the Configuration Properties tree.
    • Select General:
      • In the Additional Include Directories field, add the directories:
        • <CPLEXDIR>\include.
        • <CONCERTDIR>\include.
      • For Debug Information Format, choose Disabled (/Zd).
      • Choose No for Detect 64-bit Portability Issues. Note that these settings are not available in the Visual Studio 2010 IDE and can be omitted.

    • Select Preprocessor:
      • Add IL_STD to the Preprocessor Definitions field. This defines the macro IL_STD which is needed to use the STL.

    • Select Code Generation:
      • Set Runtime Library to Multi-threaded (/MT).
    Select Linker in the Configuration Properties tree.
    • Select General and then select Additional Library Directoriess. Add the files:
      • <CPLEXDIR>\lib\x86_windows_vs2008\stat_mta
      • <CONCERTDIR>\lib\x86_windows_vs2008\stat_mta
    • Select Input and then select Additional Dependencies. Add the files:
      • cplex123.lib
      • ilocplex.lib
      • concert.lib
      The latter two are only necessary if you are using Concert Technology.
    Click OK to close the test Property Pages dialog box.

  6. Next, you have to set the default project configuration.From the Build menu, select Configuration Manager...
    • Select Release in the Active Solution Configuration drop-down list.
    • Click Close.
  7. Finally, to build the project, from the Build menu, select Build Solution
After completion of the compiling and linking process, the target is created. The full path of the test.exe is <MYAPPDIR>\test\Release\test.exe.

Remark:

From the Concert point of view, the only difference between the Win32 Release and Win32 Debug targets is:
  • the NDEBUG macro is defined for the Win32 Release target.
  • the NDEBUG macro is not defined for the Win32 Debug target.
This is why we have suggested using Release in the test.sln example, even though it is not the default proposed by Visual C++. Refer to the Visual C++ Reference Manual for full information on Release and Debug configurations.
The interaction of the NDEBUG macro and the Concert inline member functions is documented in the Concepts section of the CPLEX C++ API Reference Manual. R

2014年4月8日星期二

开篇

最近开始做研究,才深知PH.D真的不是那么好读的,但是既然走上了这条路,就应该坚持下来,将来拿到了人生永久的头衔DR.的时候该是会多么兴奋的一件事啊。

在这个博客里主要会记录一些我的学习心得,尤其是运筹学领域,因为可能国内目前的研究并不是很好,也没有很多相关的资料可以学习,因此借这个平台既能帮助自己提高又可以将自己学习的心得和体会分享给大家,还是很不错的。