Dealing with multidimensional array

티스토리 뷰

차기작 : R을 배우자

Dealing with multidimensional array

quantlab 2015. 11. 10. 05:56

인간이 받아들이는 시각 자극은 기본적으로 2차원이다. 하지만 아주 자연스럽게 3차원의 심상을 구성해낸다.

Human's visual perception is basically 2-dimensional. But human can very easily construct 3-dimensional images.

하지만 숫자로 이루어진 표를 그린다면, 3차원의 표를 그리거나 하지는 않는다.

But when we need tables with numbers in it, we construct 2-dimensional table.

하지만 자료는 3차원 이상의 배열로 저장하는 것이 자연스러울 때가 많다.

But sometimes, data are naturally multidimensional. We store them in multidimensional array.

a2.array라는 배열은 4차원의 배열이다. 그 특성은 다음의 함수로 확인해 볼 수 있다.

a2.array is a 4-dimensional array in R. The functions below show the charateristics of a2.array.

dim(a2.array)

dimnames(a2.array)

[1] 3 5 3 3

NULL

a2.array는 크기가 3,5,3,3인 4 차원의 배열이고, 각 차원에 이름이 붙여져 있지 않다.

a2.array is an array with size (3,5,3,3) in each dimension. And dimensions are not named

이름을 붙여보자.

Let's give names to dimensions.

#01

names(dimnames(a2.array))=c("a2","n.subj","n.item","thDist")

dimnames(a2.array) = list(c(0, 0.3, 0.6), c(100, 200, 300, 500, 1000), c(5, 10, 20), c("norm","unif","skew"))

#02

dimnames(a2.array) = list(a2=c(0, 0.3, 0.6), n.subj=c(100, 200, 300, 500, 1000),

n.item=c(5, 10, 20), thDist=c("norm","unif","skew"))

dimnames(a2.array)

$a2

[1] "0" "0.3" "0.6"

$n.subj

[1] "100" "200" "300" "500" "1000"

$n.item

[1] "5" "10" "20"

$thDist

[1] "norm" "unif" "skew"

이제 각 차원의 의미를 알 수 있다. a2는 모수의 이름이고, 0, 0.3, 0.6은 모수의 값이다. n.subj는 피험자의 수, n.item은 문항의 수, thDist는 잠재변수의 분포이다.

Now we can see what each dimension stands for. a2 is a parameter name, 0, 0.3, 0.6 is the value of the parameter. n.subj is the number of subjects, n.item is the number of items, thDist is the distribtuion of theta(latent variable).

만약 다른 모수 a1, a3, a4에 대한 자료도 동일한 형태의 배열로 저장되어 있다면, 그것을 합칠 수 있다.

If there are other parameters named a1, a3, a4 and the data are stored in the same format as a2.array, we can combine all the data.

a1.array <- a2.array+rnorm(prod(dim(a2.array)), 0, 0.01)

a3.array <- a2.array+rnorm(prod(dim(a2.array)), 0, 0.01)

a4.array <- a2.array+rnorm(prod(dim(a2.array)), 0, 0.01)

a.array <- abind(a1.array, a2.array, a3.array, a4.array, along=0) # first, library(abind) and possibly, install.packages("abind")

위의 코드는 a1.array, a3.array, a4.array를 생성하고, a.array에 a1.array, a2.array, a3.array, a4.array를 모두 합쳐서 저장한다.

The code above generates a1.array, a3.array, a4.array and combine a1.array, a2.array, a3.array, a4.array into a.array.

다음의 코드에서 보듯이 다른 방식으로 합칠 수 있다.

We can combined them into different format.

a.array_ <- abind(a1.array, a2.array, a3.array, a4.array)

a.array_ <- abind(a1.array, a2.array, a3.array, a4.array, along=5)

abind( , along=0)이나 abind(, along=# of dim+1)은 새로운 차원을 만들고, along=에 1~# of dim을 넣으면 차원의 수는 유지하면서 데이터를 합친다.

abind( , along=0) or abind( , along=# of dim+1) combines the data and while doing so, it creates new dimension.

이제 새롭게 이름을 붙이자.

Now name the new array a.array.

dimnames(a.array)=append(list(param=c("a1","a2","a3","a4")), dimnames(a2.array))

names(dimnames(a.array))[2] = "true"

dimnames(a.array)

$param

[1] "a1" "a2" "a3" "a4"

$true

[1] "0" "0.3" "0.6"

$n.subj

[1] "100" "200" "300" "500" "1000"

$n.item

[1] "5" "10" "20"

$thDist

[1] "norm" "unif" "skew"

apply는 다차원 배열의 차원을 축소한다. 그리고 나머지 차원의 데이터에는 함수를 적용한다.

apply can reduce the number of dimensions applying functions.

예를 들어 A <- apply(a.array, c(3,4), mean)는 a.array의 3번째, 4번째 차원을 남기고, 나머지 차원의 자료는 mean을 한다. 따라서 A[i,j] == mean(a.array[,,i,j,])이 성립한다.

For example, if we do A <- apply(a.array, c(3,4), mean), A has only 2 dimensions from 3rd and 4th dimension of a.array. And other dimensions are summarized by the function mean. So A[i,j] == mean(a.array[,,i,j,].

만약 B <- apply(a.array, c(2,3,4), mean)를 한다면, B[i,j,k] == mean(a.array[,i,j,k,])가 성립한다.

If we do B <- apply(a.array, c(2,3,4), mean), we can check if B[i,j,k] == mean(a.array[,i,j,k,]).

> B

, , n.item = 5

n.subj

true 100 200 300 500 1000

0 0.08273783 0.0533895 0.05028688 0.0259428 0.01569786

0.3 0.38306369 0.4103878 0.37982391 0.3926833 0.42481400

0.6 0.80714926 0.8722222 0.88539986 0.8297243 0.76597885

, , n.item = 10

n.subj

true 100 200 300 500 1000

0 0.06600484 0.05343658 0.05182823 0.03045274 0.03563958

0.3 0.43104220 0.38177964 0.40131711 0.41703602 0.43510981

0.6 0.86818108 0.91474137 0.92008311 0.82428488 0.80589661

, , n.item = 20

n.subj

true 100 200 300 500 1000

0 0.05199994 0.04118865 0.03587191 0.03133996 0.02381522

0.3 0.41252357 0.41644084 0.42025730 0.42722335 0.43742146

0.6 0.85920550 0.89604753 0.82556891 0.88870384 0.90301342

B를 보자. 마지막 차원이 문항의 수이다. 만약 마지막 차원이 true라면 모수의 값에 따른 n.subj * n.item표를 볼 수 있다.

Look at B. The last dimension is n.item. If the last dimension is "true" which means true parameter value, we can see the table n.subj*n.item.

C <- aperm(B, c(2,3,1))를 해보자. C의 첫번째 차원은 B의 2번째 차원이고, C의 두번째 차원은 B의 3번째 차원, C의 마지막 차원은 B의 1번째 차원이 된다.

C <- aperm(B, c(2,3,1)). The first dimension of C is the 2nd dimension of B and the second dimension of C is the 3rd dimension of B, and so on.

> aperm(B, c(2,3,1))

, , true = 0

n.item

n.subj 5 10 20

100 0.08273783 0.06600484 0.05199994

200 0.05338950 0.05343658 0.04118865

300 0.05028688 0.05182823 0.03587191

500 0.02594280 0.03045274 0.03133996

1000 0.01569786 0.03563958 0.02381522

, , true = 0.3

n.item

n.subj 5 10 20

100 0.3830637 0.4310422 0.4125236

200 0.4103878 0.3817796 0.4164408

300 0.3798239 0.4013171 0.4202573

500 0.3926833 0.4170360 0.4272233

1000 0.4248140 0.4351098 0.4374215

, , true = 0.6

n.item

n.subj 5 10 20

100 0.8071493 0.8681811 0.8592055

200 0.8722222 0.9147414 0.8960475

300 0.8853999 0.9200831 0.8255689

500 0.8297243 0.8242849 0.8887038

1000 0.7659789 0.8058966 0.9030134

마지막으로 다음의 코드는 다차원 배열을 2차원의 표로 나타낼 때 쓸 수 있는 방법이다.

The next code shows how to display multidimensional array in 2-dimensional table.

D <- apply(a.array, c(1,2,3,4), mean)

a.mat <- array(D, c(4*3, 5*3))

dimnames(a.mat) =list(paste(rep(dimnames(a.array)[[1]], length(dimnames(a.array)[[2]])),

rep(dimnames(a.array)[[2]], each=length(dimnames(a.array)[[1]])), sep=","),

paste(rep(dimnames(a.array)[[3]], length(dimnames(a.array)[[4]])),

rep(dimnames(a.array)[[4]], each=length(dimnames(a.array)[[3]])), sep=","))

round(a.mat, 02)

100,5 200,5 300,5 500,5 1000,5 100,10 200,10 300,10 500,10 1000,10 100,20 200,20 300,20 500,20 1000,20

a1,0 0.08 0.06 0.05 0.03 0.02 0.06 0.05 0.05 0.03 0.04 0.06 0.04 0.03 0.03 0.03

a2,0 0.08 0.05 0.05 0.03 0.02 0.07 0.05 0.05 0.03 0.04 0.05 0.04 0.04 0.03 0.03

a3,0 0.08 0.05 0.05 0.03 0.01 0.07 0.06 0.06 0.03 0.03 0.05 0.04 0.04 0.03 0.02

a4,0 0.09 0.05 0.05 0.02 0.02 0.07 0.06 0.04 0.03 0.04 0.05 0.04 0.04 0.04 0.02

a1,0.3 0.38 0.41 0.39 0.40 0.42 0.43 0.39 0.39 0.42 0.44 0.41 0.41 0.42 0.44 0.44

a2,0.3 0.38 0.41 0.38 0.39 0.42 0.43 0.38 0.40 0.42 0.43 0.41 0.42 0.42 0.43 0.44

a3,0.3 0.38 0.41 0.38 0.39 0.43 0.43 0.39 0.41 0.41 0.44 0.41 0.42 0.42 0.42 0.44

a4,0.3 0.39 0.41 0.38 0.39 0.42 0.43 0.37 0.41 0.42 0.43 0.41 0.42 0.42 0.42 0.44

a1,0.6 0.81 0.87 0.89 0.84 0.76 0.87 0.92 0.92 0.82 0.80 0.86 0.89 0.83 0.89 0.90

a2,0.6 0.81 0.88 0.88 0.83 0.76 0.87 0.91 0.92 0.83 0.81 0.86 0.89 0.82 0.89 0.90

a3,0.6 0.81 0.88 0.88 0.82 0.78 0.86 0.92 0.92 0.83 0.81 0.86 0.90 0.83 0.89 0.90

a4,0.6 0.81 0.87 0.89 0.83 0.76 0.87 0.91 0.92 0.83 0.80 0.86 0.90 0.83 0.89 0.91

확실히 모수가 0.6일 때 과대추정되고 있다.

Certainly the parameters are overestimated when the true value is 0.6.

최근에 나는 좀더 간편한 방법을 발견했다.

Nice try! But recently I found that there exists a more eloquent and easier way to do the same thing.

round(ftable(D, row.vars=c(1,2)), 02)

위의 테이블과 동일한 결과를 볼 수 있다.

That's it!

a2.array <-

structure(c(0.114753736322614, 0.404087193560994, 0.812096065745919,

0.0691984103863665, 0.405310893019174, 0.887125188459893, 0.0616316477144657,

0.400274830585187, 0.86573537527353, 0.0603680379008627, 0.386513673755535,

0.741502946723747, 0.0295979729035622, 0.457382575094417, 0.740471025766707,

0.0659152486151725, 0.400383191455386, 0.831028639699018, 0.0560260653624721,

0.415518543509192, 0.928912730023655, 0.0595133598446601, 0.359289298476868,

0.851226432860258, 0.0405820157212527, 0.353693426571657, 0.833999280575229,

0.039118793437426, 0.440663023182114, 0.843089046305312, 0.0563157171667022,

0.421887852396819, 0.869925651995617, 0.0403740015356417, 0.444552426604557,

0.870136770858467, 0.0380504927694767, 0.358206058016891, 0.808101367403867,

0.0368475236617062, 0.437789515635082, 0.945457878490628, 0.0269195096537808,

0.422488059002855, 0.92916922032534, 0.078110818712903, 0.347195535685585,

0.740977813433034, 0.0426813776722355, 0.41078278931815, 0.893510335698474,

0.0390875939397656, 0.371735282156537, 0.893725886387991, 0.0167767696533033,

0.378973561083092, 0.867967833505367, 0.0146914941377656, 0.40936037912822,

0.832047787089179, 0.0747963902872325, 0.415833644622462, 0.934406485422699,

0.0582793273811563, 0.373951708112157, 0.921612738627239, 0.0455339433829314,

0.43137832583476, 0.957449748028585, 0.0322428286600292, 0.427185650508067,

0.80122740841786, 0.0333727433694025, 0.400755311879955, 0.751540551134801,

0.0452442261509687, 0.394959390317536, 0.796730004455713, 0.0393108127618852,

0.385220612117265, 0.881125882039564, 0.0336101175243408, 0.479848872042021,

0.847085013443161, 0.0320240534598604, 0.422622668582744, 0.785941537265972,

0.0261579815735083, 0.462555877705602, 0.927203160046384, 0.0590267735862295,

0.396325068598997, 0.867464546825978, 0.0406494772414111, 0.416998177454051,

0.846799279640695, 0.0525267550872886, 0.356201656368973, 0.886815234420338,

0.000824621125123532, 0.417518909751403, 0.870373379648068, 0.00165529453572468,

0.40418725858196, 0.709589952014542, 0.0634819659430928, 0.46089311125249,

0.845849738428759, 0.0462025973295874, 0.360605934504689, 0.885726887928779,

0.0396524904640301, 0.411965993742202, 0.937337889984183, 0.0153772559320576,

0.466679740293062, 0.841256465056881, 0.0365513337649941, 0.460781965792933,

0.823386907838593, 0.0561571010647808, 0.41948179936679, 0.900843382614314,

0.0380357726357701, 0.433724912819174, 0.925045479963012, 0.0405945809191325,

0.413102142332862, 0.810692037706058, 0.0249186677011433, 0.424571690059523,

0.929073237156253, 0.0231447618263831, 0.430904467370669, 0.84410545549712

), .Dim = c(3L, 5L, 3L, 3L))

'차기작 : R을 배우자' 카테고리의 다른 글

I have a table! (0)	2015.11.11
Naming files by date (0)	2015.11.11
8 dimensions (0)	2015.11.02
another log-sum-exp blogging : implementation (0)	2015.10.23
log-sum-exp trick (0)	2015.10.15

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/10 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

글 보관함

기초 통계학의 숨은 원리 이해하기

티스토리 뷰

Dealing with multidimensional array

'차기작 : R을 배우자' 카테고리의 다른 글

티스토리툴바