티스토리 뷰

Reading Aboy et al.(2006), I needed a function for computing lempel-ziv complexity.

I searched throughout the internet, but I couldn't find one.

So, I had to program one myself.

Here are the source, an example and the result.


# FUNCTION lempel.ziv(____.VEC, ____.VEC)

# s is a sequence vector

# alphabet is a vector of alphabet letters

# function counts unique sub-sequence and normalized it

# ref) Aboy et al.(2006),

# Interpretation of the Lempel-Ziv Complexity Measure

#        in the Context of Biomedical Signal Analysis

# , IEEE Trans Biomed Eng. 2006 Nov;53(11):2282-8.

lempel.ziv=function(s, alphabet) {

  

  n=sum(!is.na(s))

  s=s[!is.na(s)]

  if (sum(s %in% alphabet)!= n) { stop("Alphabet error!") }

  

  voc=s[1]; cmpl=1

  r=1; i=1; 

  while (r+i<=n) {

    Q="";

    repeat {

      Q=paste(Q,s[r+i], sep="")

      if (Q %in% voc) {

        cmpl[r+i]=cmpl[r+i-1]; i=i+1; }

      if(!(Q %in% voc) | !(r+i<=n)) { break }

    } # repeat

    if (r+i > n) break;

    

    voc=c(voc, Q); cmpl[r+i]=cmpl[r+i-1]+1;

    r=r+i; i=1; 

  }

  

  cmpl=cmpl/(1:n/log(1:n,length(alphabet)))

  return(cmpl)}



# FUNCTION lempel.ziv2(____.CHR, ____.CHR)

# Wrapper for lempel.ziv

# str is a vector of strings

# str.alphabet is a vector of alphabets of length 1 or length str 

lempel.ziv2=function(str, str.alphabet) {

  s2=strsplit(str,"")

  alphabet=strsplit(str.alphabet,"")

  

  if (length(alphabet) ==1) { inc.alphabet = 1 }

  else { 

    if (length(alphabet) != length(s2)) 

      { stop("Number of Strings and alphabets aren't the same.") }

    else { inc.alphabet=0 }}

  index.alphabet = 1 

  

  lzs=c()

    

  for (s in s2) {        

    lzs=c(lzs, lempel.ziv(s, alphabet[[1]])[length(s)])

    index.alphabet=index.alphabet+inc.alphabet

  }

  lzs

}

  

# examples for lempel.ziv and lempel.ziv2

par(mfcol=c(1,1))

a=list(); lz=list();

a[[1]]<-floor(runif(1000,min=0,max=4))

a[[2]]<-rep(0,1000)

a[[3]]<-floor(rnorm(1000,2,0.5))

a[[3]][a[[3]]<0 | a[[3]]>3]=1

a[[4]]<-rep(c(0,1,2,3),250)

a[[5]]<-rep(c(0,1),500)

a[[6]]<-floor(runif(1000,min=0,max=2))

temp=a[[1]]; temp[rep(c(T,F),500)]=1; a[[7]]=temp

# Logical error when logical vector is longer than the vector

# log=c(T,F,T,T,F,F,T,T,T); v=1:3; v[log]

# NA from nowhere!

leg = c("unif","rep0","norm/2/0.5","rep0123","rep01",

        "unif(0,1)","unif+T")


for (i in 1:length(a)) {  

  lz[[i]]<-lempel.ziv(a[[i]],c(0,1,2,3))

}


plot(lz[[1]], type="l", col=1)

for (i in 2:length(lz)) {

  points(lz[[i]], type="l",col=i)

}

legend(x="topright",legend=leg, col=1:length(leg), lty=rep("solid",length(leg)))






Reference 

Aboy et al.(2006), Interpretation of the Lempel-Ziv Complexity Measure

    in the Context of Biomedical Signal Analysis

   , IEEE Trans Biomed Eng. 2006 Nov;53(11):2282-8.

'차기작 : R을 배우자' 카테고리의 다른 글

package deepnet을 활용하여 XOR 학습하기  (0) 2014.11.08
a CRF model for denoising  (0) 2014.10.04
R studio, Git, BitBucket  (0) 2014.02.25
frequency polygons  (0) 2014.02.22
Python  (0) 2014.02.15
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2024/04   »
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30
글 보관함