Big Data with R by 송태민 & SGLee


※ 공개된 자료(Published Data) :

 

* 참고도서 :  이상구, 이재화, 김경원, [빅북총서005] 선형대수학, BigBook, 2014.

* 참고도서 :  최용석, [빅북총서008] R과 함께하는 통계학의 이해, BigBook, 2014.

 

Mathematics for BigData

Lesson 1 Introduction

■ 참고 동영상   https://youtu.be/EURJnLppzKc

■ 참고 자료

SKKU Math for Big Data, Lecture 1, Introduction, https://youtu.be/EURJnLppzKc  

Math for Big Data, Lecture 2, LU Decomposition, https://youtu.be/bzhTnoN3atk

Math for Big Data, Lecture 3, Schur Decomposition,  https://youtu.be/F2kZON0oS_w

Math for Big Data, Lecture 4, Power Method,  https://youtu.be/n4KD4aq_jxw

Math for Big Data, Lecture 5, QR Decomposition, https://youtu.be/gQ7gxTx5f9k

Math for Big Data, Lecture 6, Google's PageRank algorithm, https://youtu.be/tp6B7s43jAI

Math for Big Data, Lecture 7, Singular Value Decomposition(SVD), https://youtu.be/AxL4Q83IdAA

Math for Big Data, Lecture 8, Least Square Solutions, https://youtu.be/GwHh5lh5wEs

Math for Big Data, Lecture 9, Polar Decomposition, NMF, https://youtu.be/FqkMP9lBtaE

Math for Big Data, Lecture 10, Finding JCF using Dot Diagram, https://youtu.be/1E3wXN1oZyc

Math for Big Data, Lecture 11, Generalized eigenvectors and Matrix Function, https://youtu.be/lK4_Kp6P_N4

Math for Big Data, Lecture 12, Principal Componant Analysis 1 (PCA), https://youtu.be/0IKbslNH7xk

Math for Big Data, Lecture 13, Principal Componant Analysis 2 (PCA), https://youtu.be/j8PAt_Al180

 

Math for Big Data, Review 1, Intro. Calculus, Team 4, https://youtu.be/qALN6OAwNUo

Math for Big Data, Review 2, Intro. Linear Algebra, Team 3, https://youtu.be/xrFqBe8Rhs4

Math for Big Data, Review 3, Intro. Statistics, Team 2, https://youtu.be/sOx74EntB0I

Math for Big Data, Review 4, Intro. Engineering Math, Team 1, https://youtu.be/LRHN5swQW4E

 

Math for Big Data, Midterm PBL, S. Sun, https://youtu.be/CSdciSMPm-8

Math for Big Data, Midterm PBL, Naguib, https://youtu.be/k9_Ie8bMAY0

Math for Big Data, Midterm PBL,  KEAhn, https://youtu.be/xFJmI1_uynk

Math for Big Data, Midterm PBL, Choo, https://youtu.be/TlC78z_LErQ

Math for Big Data, Midterm PBL,  Naeem, https://youtu.be/8xo5UOP1tu8

Math for Big Data, Midterm PBL,  Lkhagva, https://youtu.be/pPtO1rNdLs0

Math for Big Data, Midterm PBL,  Sudip, https://youtu.be/5md49_RG74Q

Math for Big Data, Midterm PBL, Jeongwon Pyo, https://youtu.be/u5zDWtmx9P0

Math for Big Data, Midterm PBL,  ESJang, https://youtu.be/cHYvWBuBrFA   

 

 

Math for Big Data, Lecture 14,  Graph and Matrix, https://youtu.be/Z89XvKXIYeg

Math for Big Data, Lecture 15,  Laplacian Matrix and Big Data, https://youtu.be/4VuaOFRGm1g

Math for Big Data, Lecture 16,  Intro. Big Data for Machine Learning 1, https://youtu.be/P24A1fkpX-Y

Math for Big Data, Lecture 17,  Intro. Big Data for Machine Learning 2, https://youtu.be/bY3nfAHc6Qk

Math for Big Data, Lecture 18,  (Team 4) Intro. Data Mining, Ahn& Choo, https://youtu.be/Dq2G8ReeEcY

Math for Big Data, Lecture 19,  (Team 1) Pattern Classification 1, Naguib & Naeem, https://youtu.be/ieOUI6pc18A

Math for Big Data, Lecture 20,  (Team 1) Pattern Classification 2, Naguib & Naeem, https://youtu.be/9kxyu0e-nfQ

Math for Big Data, Lecture 21,  (Team 2) Statistical Learning, https://youtu.be/5dQO2Z3PgPU

Math for Big Data, Lecture 22,  (Team 3) Cluster Analysis, https://youtu.be/LPyFO8jFHD8

 

Math for Big Data, Lecture 23,  (Team 3) Project Draft 1, https://youtu.be/TZJrU7S1Q0o  

Math for Big Data, Lecture 24,  (Team 3) Project Presentation, Spectral Cluster Analysis by Shaowei-Lkhagva, https://youtu.be/476HgeBM8AE

Math for Big Data, Lecture 25,  (AV) Project Presentation, Restricted Boltzmann Machine Training of Perceptron for Clustering by Naguib-Naeem https://youtu.be/QLKIgUCVLIY

Math for Big Data, Lecture 25,  (Team 1) Project Presentation, Restricted Boltzmann Machine Training of Perceptron for Clustering by Naguib-Naeem https://youtu.be/vZ613MEWin4

 

Math for Big Data, Lecture 26,  (Team 2) Project Presentation,Hand Gesture Recognition with Convolutional Neural Network by Pyo-Sudip-Jang https://youtu.be/FK-ANqohVlo  

Math for Big Data, Lecture 27,  (Team 4)

 

Math for Big Data, Lecture 28,  Final PBL Presentation by Sudip, https://youtu.be/cOwWZcVb1AU   

 

 

************************* After Note (후기) ********

[논문] ‘R을 활용한 ‘대화형 통계학 입문 실습실’ 개발과 활용',

        'Interactive Statistics Laboratory  using R and Sage',

 J. Korea Soc. Math. Ed. Ser. E: Communications of Mathematical Education, Vol. 29, No. 4, Nov. 2015. 573-588

[Lab] R을 활용한 기초 통계학 실습실  

   초보   http://matrix.skku.ac.kr/2015-R-Statistics/R-Sage-Statistics-Lab-1.htm  

   입문 :  http://matrix.skku.ac.kr/2015-R-Statistics/R-Sage-Statistics-Lab-2.htm >  

   크롬 브라우저에서 위의 주소를 클릭만 하면 통계 공식, 프로그램 언어 하나도 외울 필요 없답니다.  코드 타이핑도 필요 없고~~ 하다보면 언어도 익숙해 지고 ... ^ ^

  (아래 실습실에서 언어를 Sage 대신   R   로 바꾸고 실행 하시면 됩니다)

R download : http://healthstat.snu.ac.kr/CRAN/ 

설치 

Download R 3.2.4 for Windows (62 megabytes, 32/64 bit)

If you want to double-check that the package you have downloaded exactly matches the package distributed by R, you can compare themd5sum of the .exe to the true fingerprint. You will need a version of md5sum for windows: both graphical and command line versions are available.

Frequently asked questions

Please see the R FAQ for general information about R and the R Windows FAQ for Windows-specific information.

http://matrix.skku.ac.kr/E-Math/R-Practice-all.txt 

위는 아래를 실습한 결과입니다  (by SGLee) 

































































































































 

## 메르스 "6월1-2일"  
pop_s = pop[order(pop$Code),]
inter=c(0, 100, 200, 500, 1000, 3000, 5000, 9000)
pop_c=cut(pop_s$전체,breaks=inter)
gadm$pop=as.factor(pop_c)
col=rainbow(length(levels(gadm$pop)))
p5=spplot(gadm, 'pop', col.regions=col, main='불안 감정(전체)')
pop_s = pop[order(pop$Code),]
inter=c(0, 100, 200, 500, 1000, 3000, 4200, 8000)
pop_c=cut(pop_s$오차,breaks=inter)
gadm$pop=as.factor(pop_c)
col=rainbow(length(levels(gadm$pop)))
p6=spplot(gadm, 'pop', col.regions=col, main='불안 감정(6월1-2일)')
##  여러 객체 인쇄
print(p6,pos=c(0, 0.5, 0.5, 1), more=T)
print(p5,pos=c(0.5, 0.5, 1, 1), more=T)
## 지역별메르스감정_지도.txt 작성(교차분석(지역*단계(day_group)))
install.packages('Rcmdr')
library(Rcmdr)
install.packages('catspec')
library(catspec)
setwd("c:/R소셜_2부1장")
data_spss=read.table("메르스_지역_불안감정만.txt",header=T)
## 함수 미사용(전체 빈도수 산출)
x=c('Seoul','Daejeon','Daegu','Gwangju','Busan','Ulsan','Gyeonggi','Incheon','Gangwon','Chungbuk','Chungnam','Gyeongnam',
  'Gyeongbuk','Jeonnam','Jeonbuk','Jeju','day_group')
for(i in 1:12) {
 t1=ftable(data_spss[c(x[17],x[i])])
 t2=ctab(t1,type='n')
 print(t2)
  }
## 범주형 빈도분석(메르스 관련 버즈 현황)
install.packages('Rcmdr')
library(Rcmdr)
install.packages('catspec')
library(catspec)
setwd("c:/R소셜_2부1장")
data_spss=read.table("메르스_감성분석_20150811_e.txt",header=T)
t1=ftable(data_spss[c('attitude')])
ctab(t1,type='n')
ctab(t1,type='r')
## 다중응답(국가별 다중응답분석)
setwd("c:/R소셜_2부1장")
data_spss=read.table("메르스_감성분석_20150811_nation.txt",header=T)
x=c('attitude','Channel','Account','Asia','Middle','Africa','Europe','America')
for(i in 4:8) {
 t1=ftable(data_spss[c(x[i])])
 t2=ctab(t1,type='n')
 t3=ctab(t1,type='r')
 print(t2)
 print(t3)
 }
## 메르스 예방, 대처, 치료요인 연관성예측(키워드와 종속변수간 연관분석)
rm(list=ls())
setwd("c:/R소셜_2부1장")
asso=read.table("예방_연관결과.txt",header=T)
install.packages("arules")
library(arules)
trans=as.matrix(asso,"Transaction")
rules1=apriori(trans,parameter=list(supp=0.001,conf=0.01), appearance=list(rhs=c("안심", "불안"), default="lhs"),control=list(verbose=F))
inspect(sort(rules1))
summary(rules1)
rules.sorted=sort(rules1, by="confidence")
inspect(rules.sorted)
rules.sorted=sort(rules1, by="lift")
inspect(rules.sorted)
rm(list=ls())
setwd("c:/R소셜_2부1장")
asso=read.table("대처_연관결과.txt",header=T)
install.packages("arules")
library(arules)
trans=as.matrix(asso,"Transaction")
rules1=apriori(trans,parameter=list(supp=0.001,conf=0.645), appearance=list(rhs=c("안심", "불안"), default="lhs"),control=list(verbose=F))
inspect(sort(rules1))
summary(rules1)
rules.sorted=sort(rules1, by="confidence")
inspect(rules.sorted)
rules.sorted=sort(rules1, by="lift")
inspect(rules.sorted)
rm(list=ls())
setwd("c:/R소셜_2부1장")
asso=read.table("증상_연관결과.txt",header=T)
install.packages("arules")
library(arules)
trans=as.matrix(asso,"Transaction")
rules1=apriori(trans,parameter=list(supp=0.001,conf=0.4), appearance=list(rhs=c("안심", "불안"), default="lhs"),control=list(verbose=F))
inspect(sort(rules1))
summary(rules1)
rules.sorted=sort(rules1, by="confidence")
inspect(rules.sorted)
rules.sorted=sort(rules1, by="lift")
inspect(rules.sorted)
## 다항 로지스틱 회귀분석(파일을 분리하여 이분형 로지스틱 회귀분석 실시(메르스의 감정에 영향을 미치는 요인) 
## 긍정(1)/부정(0) 이분형 로지스틱
rm(list=ls())
setwd("c:/R소셜_2부1장")
data_spss=read.table("메르스_로지스틱_예방_긍부정.txt",header=T)
summary(glm(attitude~., family=binomial,data=data_spss))
exp(coef(glm(attitude~., family=binomial,data=data_spss)))
exp(confint(glm(attitude~., family=binomial,data=data_spss)))
## 보통(1)/부정(0) 이분형 로지스틱
data_spss=read.table("메르스_로지스틱_예방_긍부정.txt",header=T)
summary(glm(attitude~., family=binomial,data=data_spss))
exp(coef(glm(attitude~., family=binomial,data=data_spss)))
exp(confint(glm(attitude~., family=binomial,data=data_spss)))
# 로지스틱 회귀분석 모형평가
## 베이지안 분류평가
install.packages('MASS') 
library(MASS)
bayes_data = read.table('c:/R소셜_2부1장/메르스_예방_모형평가.txt',header=T)
attach(bayes_data)
train_data=bayes_data[1:109234,]
test_data=bayes_data[109235:218468,]
group_data=attitude[1:109234]
train_data.lda=lda(attitude~Healthcare+Outing+Handcleaner+Immunity+Mask,data=train_data)
train_data.lda
ldapred=predict(train_data.lda, test_data)$class
classification=table(ldapred, group_data)
classification
## 분류모형 평가 지표 산출 함수
perm_a=function(p1, p2, p3, p4) {pr_a=(p1+p4)/sum(p1, p2, p3, p4)
     return(pr_a)}
perm_a(85373, 19532, 3589, 740)
perm_e=function(p1, p2, p3, p4) {pr_e=(p2+p3)/sum(p1, p2, p3, p4)
     return(pr_e)}
perm_e(85373, 19532, 3589, 740)
perm_s=function(p1, p2, p3, p4) {pr_s=p1/(p1+p2)
     return(pr_s)}
perm_s(85373, 19532, 3589, 740)
perm_sp=function(p1, p2, p3, p4) {pr_sp=p4/(p3+p4)
     return(pr_sp)}
perm_sp(85373, 19532, 3589, 740)
perm_p=function(p1, p2, p3, p4) {pr_p=p1/(p1+p3)
     return(pr_p)}
perm_p(85373, 19532, 3589, 740)
## 데이터마이닝
## 의사결정나무
install.packages('party')
library(party)
install.packages('caret')
library(caret)
setwd("c:/R소셜_2부1장")
tdata=read.table("메르스_로지스틱_예방_긍부정.txt",header=T)
attach(tdata)
ind=sample(2, nrow(tdata), replace=T,prob=c(0.5,0.5))
tr_data=tdata[ind==1,]
te_data=tdata[ind==2,]
i_ctree=ctree(attitude~.,data=tr_data)
print(i_ctree)
plot(i_ctree)
## 모형의 성능평가(분류평가)
ipredict=predict(i_ctree, te_data)
table(ipredict,te_data$attitude)
ipredict=predict(i_ctree, tr_data)
table(ipredict,tr_data$attitude)