Sunday, July 5, 2015

Implications

This blog is a continuation of the FCA blog. I actually mused over this in Summer 2004, but wanted to recap for myself and share some source code.

The article I read back then was Endliche Hüllensysteme und ihre Implikationenbasen (Finite hull systems and their implications). I summarize this article here, since I didn't find an english version online.

In short, a context is constructed, where rows are subsets of attributes ''A'', i.e. elements of the power set, and columns are implications between subsets and the incidence is given by respects ( ), H = {2A, 2A × 2A, ⊢} . This context is simplified, by applying the Armstrong rules, to form a non-redundant, complete, reduced context, named B . Then an algorithm is presented, which takes the intents of some context and recursively constructs the respected implications using B and afterwards uses the 3rd Armstrong rule to find an easy set of implications, i.e. not necessarily minimal, but easier than the Duquenne-Guiges-Basis.

Definitions

  • U → V : an implication is a pair (U, V) , U ⊆ M and V ⊆ M U is premiss and V conclusion.
  • WU → V : A set W respects (U, V) , if UWorV ⊆ W . One also says: (U, V) holds for W .
  • If V ⊆ U the implication is called trivial.

An intelligent system repeatedly gets sets of elements (attributes) as input (objects) and associates these elements. Implications are the result of many such sets.

  • A concept lattice C respects (U, V) , iff U’ ⊆ V .
  • (U, V) holds in C , iff V ⊆ U’’

Implications can be composed to give new implications. The minimal set of implications that produce all other implications is called an implication basis.

A minimal implication basis is given by the Duquenne-Guiges-Basis (stem basis), which consists of all pseudo-contents:

Let X ↦ X’’ be a closure operation, then P is pseudo-closed if

  • P  ≠  P’’
  • Q ⊂ P  ⇒  Q’’ ⊆ P

There cannot be a implication basis with less implications, than the Duquenne-Guiges-Basis, but one can choose implications that are easier (with low |U||V| , U → V with V ≠ ∅ and UV = ∅ ).

Construction of H

H = {2A, 2A × 2A, ⊢} is a context and the according concepts C(H) = {(O, A)} are subsets (= objects of H ) that respect a set of implications (= attributes H ).

The basic implications ( ⇒  ) respected by this context C ( ) are given by the Armstrong Axioms:

 ⇒ {U → U}  (A1) {U → V}  ⇒ {UW → U}  (A2) {U → V, VW → Z}  ⇒ {UW → Z}  (A3)

Some derived ones are ( ≡  is  ⇒  in both directions):

{U → V}  ≡ {U → UV}  (A4) {U → V, U → W}  ≡ {U → VW}  (A5) {U → V}  ≡ {W → V|U ⊆ W}  (A6) {U → V, UVW → Z}  ≡ {U → V, UW → Z}  (A7)

Note

Notation: UV means UV and Ui means U∪{i}

H can be constructed inductively. The sequence of the attributes is of no importance when constructing the power set.

Note

In [1] {⋆} named the new attribute in the step n and {⋆⋆} in the step n + 1 . Here I name them i and j and becomes a wild card meaning whatever element in a previous step.

In the following i means a new attribute. Because of A4:{X → Y}  ≡ {X → XY} only implications with sets U ⊆ V are included, therefore Ui → V is not there.

  • Ho = 1 , because ∅⊢∅ → ∅ .
  • Lo = 0 , because ∅⊬∅ → {i} .
Hn + 1 =  ⊢  U → V U → Vi Ui → Vi X Hn Ln 1 Xi Hn Hn Hn
  • (2, ⋆) implications hold with Xi = X∪{i} , because X respects U → V at location (1, 1) .
  • The L in (1, 2) is discussed below
  • The 1 in (1, 3) is because UiX makes the implication hold.

Ln + 1 is part of Hn + 2 , which adds another element j .

Ln + 1 =  ⊢  U → Vy U → Vij Ui → Vij X Ln Ln 1 Xi Ln Ln Ln
  • The columns of Ln + 1 are those of Hn + 1 augmented by j at the conlusion side.
  • The rows are the same as in Hn + 1 .
  • As i , so is j not in X and we can distribute L to all but (1, 3) , where UiX makes it hold.

Construction of B

Because of A5 ({U → V, U → W}  ≡ {U → VW} ) we can reduce H to implications of the shape U → U∪{i} , abbreviated U → i (thus Ui → j means U∪{i} → U∪{i, j} .

In the following names the extra element on the right of a previous recursion step, whichever that was.

B is in analogy to H horizontally subdivided by three event:

  • no new element i , ie some other of a previous step is on the right.
  • with i on right side
  • with i on both sides: the extra element on the right is any of a previous step.
B1 =  0 1       A1 =  0 1 0 0
Bn + 1 =  ⊢  U → ⋆  U → i Ui → ⋆ X Bn An 1 Xi Bn 1 Bn
  • (1, 3) : UiX , so it holds.
  • (2, 2) : XiU → i . If UX , so is UiXi , ie it holds; same for  ⊆  .
  • (1, 1) ⇒ (2, 1), (2, 3) : XU → ⋆ ≡ XiU → ⋆ ≡ XiUi → ⋆
  • (1, 2) : see below

An + 1 is part of Bn + 2 , which adds another element j on the right side. The new element of step n (i ) can be on the left side or not, hence the horizontal partitioning.

An + 1 =  ⊢  U → j Ui → j X An 1 Xi An An

The implications (columns) in B are reduced. The rows are reduced apart from the last line, which is always full.

Application to Find Implications for a Context

H , L , B , A ( = X ) are used to find the implication basis and the intents.

We recursively construct the composed context KX defined by X(K(g)) , g is object of K .

Every attribute of K(g) = g gets a position. This position says whether the attribute is there or not there - column after the if in the following formulas. So every g is coded by a binary number. This is λ:2M↦{0, 1}n in [1].

The first two lines for (g[n])Hn show the according implication coding. This is κ:(UV)↦{0, 1, 2}n with 2 = i ∈ U , 1 = i ∈ V and UV = ∅ in [1]. So every binary attribute coding becomes a ternary implication coding.

For n = 1

(g[1])H1 =  1, 0, 1 1, 1, 1    (g[1])L1 =  0, 0, 1  if g1 = 0 0, 0, 0  if g1 = 1

For n > 1

(g[n])Hn =  0   1   2 U → V U → Vi Ui → Vi (g[n − 1])Hn − 1  (g[n − 1])Ln − 1  13n − 1  if gn = 0 (g[n − 1])Hn − 1  (g[n − 1])Hn − 1  (g[n − 1])Hn − 1  if gn = 1
(g[n])Ln =  (g[n − 1])Ln − 1  (g[n − 1])Ln − 1  13n − 1  if gn = 0 (g[n − 1])Ln − 1  (g[n − 1])Ln − 1  (g[n − 1])Ln − 1  if gn = 1

The recursive construct with coding 1 = x ∈ V < 2 = x ∈ U shows that the column position, interpreted as ternary number, codes the implication. If the incidence at the column position is 1, then g respects this implication: {g’}Hn is indicator function for the respected implications.

The coding will also place a larger V before a smaller one when going from right to left. This way one can filter out implications with already encountered same left side.

The UV will be closed. If there is no implication for a left side, this is a closed intent, too.

Note

Python Implementation

Python's arbitrary precission integers is used for coding. An attribute's position is the bit position of the integer. Every object is thus an integer. A context is a list of integers.

In here higher bits are more left; in the [1] they are more right; the horizontal order is inverted.

def UV_H(Hg,gw):
    """
    Return implications UV respecting g.
    gw = g width
    Hg = H(g), g is the binary coding of the attribute set
    UV = all non-trivial (!V⊂U) implications U->V with UuV closed; in ternary coding (1=V,2=U)
    K = all closed sets
    """
    lefts = set()
    K = set()
    UV = set()
    p = Hwidth(gw)
    pp = 2**p
    while p:
        pp = pp>>1
        p = p-1
        if Hg&pp:
            y = istr(p,3,gw)
            yy = y.replace('1','0')
            is02 = y.find('1') == -1 #y∈{0,2}^n
            if is02:
                K.add(y)
            elif yy not in lefts:
                lefts.add(yy)
                UV.add(y)
    return (UV,K)

Now we do the analogue procedure with the B context. In B the implications are of the shape U → U∪{x} , so we need to specify U and the one binary position of x. This implication coding can be derived from the column position recursively (see below).

For n = 1 :

(g[1])B1 =  0 1    (g[1])A1 =  0 1  if g1 = 0 0 0  if g1 = 1

For n > 1 :

(g[n])Bn =  U → ⋆  U → i Ui → ⋆ (g[n − 1])Bn − 1  (g[n − 1])An − 1  1(n − 1)2n − 2  if gn = 0 (g[n − 1])Bn − 1  12n − 1  (g[n − 1])Bn − 1  if gn = 1
(g[n])An =  U → j Ui → j (g[n − 1])An − 1  12n − 1  if gn = 0 (g[n − 1])An − 1  (g[n − 1])An − 1  if gn = 1

The following Python returns the implication coding for a column t .

Awidth = lambda n: 2**n
Bwidth = lambda n:n*2**(n-1)
def A012(t,i):
    if i<0:
        return ""
    nA = Awidth(i)
    if t < nA:
        return "0"+A012(t,i-1)
    else:
        return "2"+A012(t-nA,i-1)
def B012(t,i):
    """
    Constructs ternary implication coding (0=not there, 2=U, 1=V)
    t is B column position
    i = |M|-1 to 0
    """
    if not i:
        return "1"
    nA = Awidth(i)
    nB = Bwidth(i)
    nBB = nB + nA
    if t < nB:
        return "0"+B012(t,i-1)
    elif t < nBB:
        return "1"+A012(t-nB,i-1)
    else:
        return "2"+B012(t-nBB,i-1)

The positions of the 1's in the B row for a given g yield the implications. Here the python source code:

def UV_B(Bg,gw):
    """
    returns the implications UV based on B
    Bg = B(g), g∈2^M
    gw = |M|, M is the set of all attributes
    """
    UV = []
    p = Bwidth(gw)
    pp = 2**p
    while p:
        pp = pp>>1
        p = p-1
        if Bg&pp:
            uv = B012(p,gw-1)
            UV.append(uv)
    return UV

To get the implications that hold in a context K , i.e. for intent 1 and intent 2 and ..., we &-combine (multiply) the B rows for intent 1 and intent 2 and ...

Note

See pyfca for the full and possibly updated source code.

Constructing a Basis

The implications that result via B can be pruned by

  • first finding the minimal antichain
  • then applying the 3rd Armstrong

The minimal Antichain

The 1-position (j) in the ternary coding is called significant component. Lj are all implications with the same significant component.

The implications U → j constructed in this way via KB are ordered lexicographically based on 0 ≤ 1 ≤ 2 .

They are also ordered with the product order, which is a partial order, and it depicts the containment hierarchy of the U side. We want to find the smallest U of every chain in Lj . This minLj is an antichain and so is L = ∪minLj .

The UV_B above changed to return this minimal antichain:

def v_Us_B(Bg,gw):
    """
    returns the implications {v:Us} based on B
    v is the significant component
    Bg = B(g), g∈2^M
    gw = |M|, M is the set of all attributes
    """
    v_Us = defaultdict(list)
    p = Bwidth(gw)
    pp = 2**p
    while p:
        pp = pp>>1
        p = p-1
        if Bg&pp:
            uv = B012(p,gw-1)
            #lets find minima regarding product order
            #{v:[Umin1,Umin2,...]}
            v = uv.find('1')#v=significant
            u = uv[:v]+'0'+uv[v+1:]
            u = int(u.replace('2','1'),2)
            Umin_s = self[gw-v-1]#bit position from right
            it = [i for i,U in enumerate(Umin_s) if U&u==u]
            for i in reversed(it):
                del Umin_s[i]
            else:
                Umin_s.append(u)
    return v_Us

The Basis

In the construction of the minimal antichain the 2nd Armstrong rule has been used, but there is still redundancy due to the 3rd Armstrong rule {U → V, VW → Z}  ⇒ {UW → Z} .

A component-wise operation is created to represent the 3rd Armstrong rule for the 0,1,2-coding

  • UVW ⇒ UW : 0○0 = 0 and 2○0 = 0○2 = 2○2 = 2 This is a binary or operation.
  • VVW ⇒ UW , no V in UW : 1○2 = 2○1 = 0 .
  • Z ⇒ Z , VZ = ∅ : 1○0 = 0○1 = 1
  • LiLj = {xy|x ∈ Li, y ∈ Lj, i ≠ j}

Note that it is required that, with X → x and Y → y , x ∈ Y xor y ∈ X to make this the 3rd Armstrong rule.

This is implemented in python as

#u1,u2 code the attribute sets via the bits of an integer
#v1,v2 are bit indices
#u1->v1, vv1 = 2**v1
#u2->v2, vv2 = 2**v2
if vv2&u1 and not vv1&u2:
    return (v1,[(u1|u2)&~vv2])
elif vv1&u2 and not vv2&u1:
    return (v2,[(u1|u2)&~vv1])

The properties of are:

  • xy = yx
  • x ≠ xy ≠ y
  • If x ≤ ny , then xy ≤ ny ,  ≤ n is the product order
  • If xy ∈ Li , then x ∈ Li or y ∈ Li (note: Li not minLi )
  • If there exits xy = z ∈ L , then there exist ,  ∈ L with  = z

The objective is to find Y ⊆ L , of which the repeated application of produces L : Y = L .

Y is defined as

  • Y2 = YY
  • Yn + 1 = Yn∪{z ∈ L|z = xy, x ∈ Yn, y ∈ YYn} This means that of Yn○(YnY) only elements in L are added at each step. But the algorithm below can cope with Yn getting bigger than L.
  • Y = ∪n ≥ 2Yn
  • Y = Y∪○Y

One starts with Y0 = L (LL) , i.e. all those elements of L that are not results of an application of the 3rd Armstrong rule. As long as Yn ≠ L , we add z ∈ L Yn to Yn . Yn∪{z} cannot generate elements of Y0 , but it can generate elements of Zn = Yn Y0 . Therefore we do Yn + 1 = Y0∪(Zn (○(Ynz)))∪z , i.e. we also remove elements, but only those that can be generated together with z , hence Yn ⊂ Yn + 1 .

Note

The formula Yn + 1 = Y0∪(Zn (○(Ynz)))∪{z} in [1] lets you think, that only the current z can generate itself. But actually other z ∈ Zn could generate themselves. Zn (○(Ynz)) could remove elements other than z that are needed for the generation of elements in Yn , thus making Yn¬ ⊂ Yn + 1 . Still the latter is postulated in [1]. There is a comment, though, that states, one should use disjunct union. This is interpreted as: don't remove elements of Zn , where the current z is not involved in the generation.

Here ist the python code for that:

def basis_minomega(self):
    L = self
    Y = L - (L*L)
    while True:
        Ybar = Y + ~Y
        take = L - Ybar
        if not len(take):
            return Y
        else:
            z = next(take.flatten())
            Yzgen = Y**z #generate with z involved
            Y = (Y - Yzgen) + z

Note

This algorithm can generate a basis that is smaller regarding ω(UV) = |U||V| , but not necessarily in count compared to the Duquenne-Guiges-Basis.

[1](1, 2, 3, 4, 5, 6) Endliche Hüllensysteme und ihre Implikationenbasen by Roman König.

Sunday, June 28, 2015

Formal Concept Analysis

My Motivation

A had my first contact with Formal Concept Analysis (FCA) in Summer 2004. At the time I did some experiments in python, which I want to share here. Later on I became too busy again with work and other projects. Writing about it is also a refresher to myself.

My first motivation that made me google and find FCA was DB related.

As an example let's take (Givenname,Familyname). You can do either (G,iG)1-n(iG,F) and you will have many family names F occurring more than once (n>1 times) or you can do (G,iF)m-1(iF,F) and you will have many given names G occurring more than once (m>1 times).

To avoid this you would do

(G,iG)m-n(iF,F)  ⇒  (G,iG)1-m(iG,iF)n-1(iF,F)

This way a given name and a family name would be there once and you would reference each of them.

One can repeat the process, if there are more columns. This is called DB normalization.

  • Basically whenever a combination of values turns up more then once one would have to make an extra table. This extreme normally is not done.
  • Actually the foreign key is still repeated instead. Something like run-length encoding would mean referencing tables instead of rows in tables. It is possible but not relational database like.
  • Normally you end up having a lot of repeats in a table. This flattening of data and thereby ignoring the context is the reason for probability.

My previous blog was What is Information. There I identify the omnipresent variable, understood as a set to exclusively choose from, as a fundamental building block of information processing.

In FCA the basic variable is binary: distinguishable elements of a set, called attributes, are either there or not there together, and such a subset of the set of attributes is an object.

The next step of an information processing system, in order to reduce the memory usage, is to find, which of the attributes are exclusive in a context, and thus form a new variable.

A DBs is designed by a human. The analysis leading to tables/variables is done in our mind. How is the algorithm we perform in our mind? How could this 'intelligence be done artificially'? It seemed quite basic to information processing. People quite dynamically create variables, relate them, and categorize them based on certain properties or usages of relevance to the task at hand.

For me two questions are important.

  • How does our mind come up with variables and how do we organize them?
  • What is the most (memory) efficient way to do it?

Years before I had come across an article and subsequently the Web site of Gerard Wolff, which considers compression the basis of computing.

During university time I also had some contact with artificial neural network. It has some commonalities with FCA, like Hebb rule, which associates stimuli and thus forms concepts in a parallel fashion.

Definitions

Basic starting point is the context, but not the diffuse context of everyday language, but mathematically well confined.

K = (O, A, I)
  • K is called context.
  • O is a set of objects.
  • A is a set of attributes.
  • I ⊂ O × A is the incidence.

2 = {1, 2} and 2A denotes the power set. For subsets Os ⊂ O (Os ∈ 2O ) and As ⊂ A (As ∈ 2A ) one defines these mappings:

  • from a subset of O (Os ) to a subset of A (As ):
2O → 2A OKs = Os’ = {a ∈ A|∀o ∈ Os(o, a) ∈ I}
  • and vice versa
2A → 2O AKs = As’ = {o ∈ O|∀a ∈ As(o, a) ∈ I}

K is represented as a table where rows are objects and columns are attributes.

Note that K(o) = oK = o’ : = {o}K maps O → 2A and that AKs ≠ K(As) = {K(a)|a ∈ As} .

The context is called clarified, if there are no duplicate rows or columns: K(o) = o and K(a) = a are injective.

K(K(o)) = o’’ and K(K(a)) = a’’ are closure operation.

C = {c|c = (Os’’, Os’) = (As’, As’’), Os ∈ 2O, As ∈ 2A} = {O, A} is a concept lattice. Every concept node in the lattice consists of a set of objects and a set of attributes. Nodes further up have more objects and nodes further down have more attributes. If the sets of objects (attributes) of the one above is a superset (subset) of the objects (attributes) of the one below, then these nodes are linked; they are comparable and form a chain. For neighboring linked nodes, the one above is called successor, the one below predecessor. Two links meet when moving downward and they join upward.

meet = infimum = :

Cs  = {∩Os’’, ∪Os’}

join = supremum = :

Cs  = {∪Os’’, ∩Os’}

The s subscript is iterated over after the mapping has been done.

The is a logical or (union) of the extents. The is a logical and (intersection) of the intents.

The partial ordering is by convention based on the extent (objects) and not the intent (attributes) of concepts. One can, though, regard the attributes as objects and the objects as attributes: The two concept lattices are called dual to each other.

Attributes or objects that don't produce new nodes in the lattice, but are only an addition to already existing nodes, can be removed. The intent of such objects (extent of such attributes) are equal to the intersection of other intents (extents). Removing them will produce a reduced context.

Two concepts that can be compared (a < b ), will meet in a and join in b . If not comparable they will join or meet somewhere else, say c . c is comparable to a and b . Possibly they will join at the top or meet at the bottom.

Discussion

Here some thoughts of interest to me, skimming the surface, asking for further explorations.

The open sets in a topology

  • The empty set and X itself are open.
  • Any union of open sets is open.
  • The intersection of any finite number of open sets is open.

can be compared to concepts. The set X can be identified as attributes.

The concept lattice is the result of many (open set) inputs.

A chain can be seen as an extensive quantity, whose elementary constituents are a selection of attributes. In a one-dimensional crossing of a chain the attributes can be ordered by the time they meet the chain. 'How far down' from a starting point would be a metric for these attributes.

If going down the chain we have a regular pattern of meeting 2 new attributes, then we deal with a 2-dimensional space. The shortest distance determines, which attributes belong to which dimension. This thread leads to geometry.

Elementary attributes are the intents of the nodes just below top. It is waisted resources linking them to top. Instead in a lattice they can be identified by their links downward, by their topology.

Due to the topology induced ordering it is possible to address the attributes via counting, i.e. via combinations of concepts. That is the start of analytical descriptions and of languages.

In the context every attribute was linked to every object with a switch: there xor not there (bit). N attributes that are exclusive can consume less than N bits memory by using combinations: logN bits. Concepts ai are (locally) exclusive if they meet with a concept b in the pattern aib = ci . So the ai form a variable and the ci are a function of a1 , i.e. they can be constructed instead of stored.

A variable is a meta-concept. A variable doesn't need to be linked to every individual value (=concept), if a linear ordering is available to allow intervals. When each of a set of attributes (concepts) meets with each of another set, a 2-dimensional space is spanned (cartesion product). Here analytic descriptions of shapes allow to reduce memory usage.

These analytical elements are also concepts, concepts with parameters: they are called functions in programming. When these functions with formal parameters meet actual parameters, they generate concepts. To address space-time tradeoff the generated concepts must stay around for some time.

The actual parameters must fit to the formal parameters in the requirements imposed by the functions. These requirements are another important kind of concepts: types.

The links in a concept lattice don't have any order. When mapped to a linear memory space a linear order is automatically overlaid, though. If a linear order is inherent to concepts due to their topology, then it's natural to keep this order in memory. Else mechanisms must be installed for the linking that don't use the linear order of the memory space.

If (almost) all combination of concepts are there a natural representation in linear binary memory is by mapping the concepts to bits. Every concept is then an arbitrary precission integer

If the purpose of the combinations is to address exclusive concepts a register machine's addressing capabilities are best put to work.

Actual links, adjustable, like via synapses in the brain, are more kindred to concept lattices.

Register machines with more or less hardware implemented functions (instruction set), can be regarded as an extraction of the analytical features of our brain, first evolved via biological evolution, then via the science culture. Many algorithms developed address special application and are surely very efficient there.

I regard FCA as a connection to the starting points of this development and therefore to consider in a venture to artificially reproduce analytic capabilities of our brain.

Experiments in Python

Since my first contact with FCA new algorithms have been devised to efficiently make a concept lattice, i.e. construct all concepts. In-Close, In-Close2, CbO, FCbO seem to be among the currently fastest ones. (from Comparing Perfomance of Algorithms for Generating Concept Lattices (Kurznetsov)).

My experiments were with older algorithms: NextConcept, Neighbors and AddIntent.

While writing this blog, I've decided to make a github repository pyfca to collect some algorithms in python.

So far it includes a python implementation of the AddIntent algorithm from AddIntent: A New Incremental Algorithm for Constructing Concept Lattices and a lattice drawing algorithm I had taken from Galicia. So it can be used to create a concept lattice (fca_lattice) and to draw it (lattice_diagram) either using tkinter() or svg().inkscape().

My intention is to add more algorithms in times of leisure. A collaborative effort could be more comprehensive. Therefore: Contributions are very welcome.