经典决策树之SAS实现–CHAID

star2017 1年前 ⋅ 9148 阅读

来自http://blog.sina.com.cn/s/blog_8db50cf70101hu2l.html

一、使用SAS/EM界面生成CHAID决策树

这部分内容摘自:http://www.sasresource.com/artical72.html

CHAID (Chi-Square Automatic Interaction Detector)演算法為利用卡方分析(Chi-Square Test)預測二個變數是否需要合併,如能夠產生最大的類別差異的預測變數,將成為節點的分隔變數。透過計算節點中類別的 P值 (P-Value),以P值大小來決定決策樹是否繼續生長,所以不需像C4.5或CART要再做決策樹修剪的動作。CHAID 與CART、C4.5 之差異在於,CHAID只限於處理類別變數,如連續變數必須採用區段的方式,轉換成類別變數。另一差異部分在於修剪的部分,CART、C4.5 是先過度套用資料訓練,之後再修剪。但CHAID 則是在過度套用之前即停止支點蔓生擴大。

1

二、通过R和SAS代码实现 

样例数据来自 R 包,主要是想比较 R 和 SAS 生成的决策树是否相同。运行的结果确实不同,估计是运行参数的不同导致,仍有待研究。

1) Build CHAID tree using R

# Train data:
library(partykit)
library(“CHAID”)
data(“BreastCancer”, package = “mlbench”)

# Build model:
ctrl <- chaid_control(alpha2 = 0.05, alpha3 = -1, alpha4 = 0.05,
    minsplit=2, minbucket = 5, minprob = 0.01, stump = FALSE, maxheight = 6)

b_chaid <- chaid(Class ~ Cl.thickness + Cell.size + Cell.shape + Marg.adhesion +
    Epith.c.size + Bare.nuclei + Bl.cromatin + Normal.nucleoli + Mitoses,
    data = BreastCancer, na.action = na.pass, control = ctrl)

plot(b_chaid)

2

2) Build CHAID tree using SAS/EM

SAS/EM Chaid Tree:

3

SAS Code: 

proc iml;
submit /R;
#setInternet2(TRUE)
#install.packages(“CHAID”, repos=”http://R-Forge.R-project.org“)

# Train data:
library(partykit)
library(“CHAID”)
data(“BreastCancer”, package = “mlbench”)

# Build model:
b_chaid <- chaid(Class ~ Cl.thickness + Cell.size + Cell.shape + Marg.adhesion +
    Epith.c.size + Bare.nuclei + Bl.cromatin + Normal.nucleoli + Mitoses,
    data = BreastCancer)
png(“D:/sbjgay/Chaid_r_plot.png”)
plot(b_chaid)
dev.off()
endsubmit;

call ImportDataSetFromR(“work.BreastCancer”, “BreastCancer”);
run;quit;
filename rulecode “c:\temp\em_chaid_rules.sas”;

*————————————————————*;
* Tree: Run ARBOR procedure;
*————————————————————*;
proc arbor data=work.BreastCancer
  Leafsize=1
  Splitsize=2
  Mincatsize = 5
  Maxbranch=10
  Maxdepth=6
  Criterion=PROBCHISQ
  alpha = 0.05
  Padjust= CHAIDAFTER
  DEPTH
  MAXRULES=5
  MAXSURRS=0
  Missing=USEINSEARCH
  Exhaustive=0
  event=’malignant’
  ;
  input Cl_thickness Cell_size Cell_shape Marg_adhesion
  Epith_c_size Bare_nuclei Bl_cromatin Normal_nucleoli Mitoses / level=nominal;
  target Class / level=NOMINAL Criterion=PROBCHISQ;
  Performance DISK NodeSize=20000;
  Assess NoValidata measure=MISC;
  SUBTREE LARGEST;
  MAKEMACRO NLEAVES=nleaves;
  save
     MODEL=Tree_EMTREE
     SEQUENCE=Tree_OUTSEQ
     IMPORTANCE=Tree_OUTIMPORT
     NODESTAT=Tree_OUTNODES
     SUMMARY=Tree_OUTSUMMARY
     STATSBYNODE=Tree_OUTSTATS
     Topology=Tree_OUTTOPOLOGY
     Path = Tree_OUTPATH
     Rules=Tree_OUTRules
  ;
  code file=rulecode;
run;
quit;

3) Build CHAID tree using TreeDisc.sas in SAS 9.3

NOTE: Treedisc.sas does not work in 9.4.
Treedisc.sas and xmacro.sas can be downloaded fromhttp://www.public.iastate.edu/~kkoehler/stat557/sas/.

%inc ‘c:\temp\chaid\xmacro.sas’;
%inc ‘c:\temp\chaid\treedisc.sas’;
data set2;
  set breastcancer;
run;
%treedisc(data=set2, depvar=class, freq=, ordinal=,
nominal=Cl_thickness Cell_size Cell_shape Marg_adhesion Epith_c_size Bare_nuclei Bl_cromatin Normal_nucleoli Mitoses,
alpha=0.05,
outtree=trd,
options=noformat,
trace=long);

%treedisc(intree=trd, draw=graphics);

NOTE: Trees built by the above 3 methods are different. Not sure why this happened.

http://www.public.iastate.edu/~kkoehler/stat557/tree14p.pdf 

http://www.public.iastate.edu/~kkoehler/stat557/tree24p.pdf

http://www.public.iastate.edu/~kkoehler/stat557/tree2.4page.pdf

http://www.public.iastate.edu/~kkoehler/stat557/sas/treedisc.sas

Reference:

http://www.sascommunity.org/seugi/SEUGI1996/STURE.pdf

http://www.lexjansen.com/pnwsug/1998/PNWSUG98004.pdf

原创文章,作者:xsmile,如若转载,请注明出处:http://www.17bigdata.com/%e7%bb%8f%e5%85%b8%e5%86%b3%e7%ad%96%e6%a0%91%e4%b9%8bsas%e5%ae%9e%e7%8e%b0-chaid/

更多内容请访问:IT源点

相关文章推荐

全部评论: 0

    我有话说: