skip navigation.

ABL : Alignment-Based Learning


Input   
   
Output


  




Introduction

Alignment-Based Learning (ABL), introduced in [1], is a symbolic grammar inference framework that has successfully been applied for several unsupervised machine learning tasks in Natural Language Processing (NLP). Given sequences of symbols only, a system that implements ABL induces structure by aligning and comparing the input sequences. As a result, the input sequences are augmented with the induced structure.

The demo on this page allows you to apply ABL on unstructured text. For a stand-alone implementation for non-commercial educational purposes and research, see the software release announcement or go to this page .

Alignment learning

The first phase in ABL, alignment learning, builds a search space of possible structures, called hypotheses, by comparing the sequences and clustering sub-sequences that share similar context. E.g., when we have the following input sentences:
    she is smart
    she is tired
alignment learning would result in the following hypothesized structures:
    (0 she is (1 smart )1 )0
    (0 she is (1 tired )1 )0
However, the set of hypotheses that are generated during alignment learning contain hypotheses that are unlikely to be correct constituents. We also assume the underlying grammar that is induced to be context-free, which implies that overlapping constituents are unacceptable. For example, consider the following plain text sentences:
    she runs from CBD to Bondi
    he drives from CBD to Bondi
    he walks terribly slow
After alignment learning, the hypothesized structure would contain overlapping constituents in sentence 2 and look as follows:
    (0 (1 she runs )1 from CBD to Bondi )0
    (0 (1 he (2 drives )1 from CBD to Bondi )2 )0
    (0 he (2 walks terribly slow )2 )0

Selection learning

In the second phase, selection learning, the most probable combination of hypothesized structures are selected in an Expectation-Maximalization search and overlapping constituents are eliminated according to a heuristic. The final induced structure thus becomes:
    (0 (1 she runs )1 from CBD to Bondi )0
    (0 (1 he drives )1 from CBD to Bondi )0
    (0 he (2 walks terribly slow )2 )0

If desired, the induced structure could be transformed to a context-free grammar (CFG):
    0 -> 1 from CBD to Bondi
    0 -> he 2
    1 -> she runs
    1 -> he drives
    2 -> walks terribly slow


If you have any questions or comments, please do not hesitate to contact me.

References

[1] Menno van Zaanen (2001) Bootstrapping Structure into Language: Alignment-Based Learning. PhD Thesis, School of Computing, University of Leeds, UK. [ PDF ]
[2] Jeroen Geertzen and Menno van Zaanen (2004). Grammatical Inference using Suffix Trees. In: Proceedings of the 7th International Colloquium on Grammatical Inference (ICGI) Athens, Greece, October 11-13. [ PDF ]