Tuesday, July 3, 2007

Corrections to ACL Anthology URLs

Thanks to an alert reader, I found out that several of the paper links in my previous postings on ACL and EMNLP-CoNLL papers where incorrect. The problem was that some of the BibTeX entries in the ACL DVD distributed in Prague have wrong ACL Anthology links, and I derived these postings semi-authomatically from those BibTeX entries. I've edited the most recent posting to use the correct links.

Monday, July 2, 2007

Reposting interesting ACL and CoNLL-EMNLP papers

I've now added a short comment to each paper. This list is created semi-automatically from BibDesk with a custom HTML export template and some minor post-editing. The red titles are my special picks.

  1. Frustratingly Easy Domain Adaptation
    H. Daume III
    Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics  256--263  (2007)
    http://www.aclweb.org/anthology/P/P07/P07-1033
    Assumes both source and target labeled data. Instance features are replicated as "feature from source" and "fature from target". Results are surprisingly good for such a simple method. Why? It is easy to create a counterexample in which this does not work, so it would be important to characterize precisely when it works.
  2. A Bayesian Model for Discovering Typological Implications
    H. Daume III and L. Campbell
    Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics  65--72  (2007)
    http://www.aclweb.org/anthology/P/P07/P07-1009
    Induces relationships between typological features of languages from very sparse descriptive data. Finds relationships discussed in the comparative literature as well as some others that deserve investigation.
  3. Sparse Information Extraction: Unsupervised Language Models to the Rescue
    D. Downey, S. Schoenmackers, and O. Etzioni
    Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics  696--703  (2007)
    http://www.aclweb.org/anthology/P/P07/P07-1088
    The main problem with previous work on unsupervised extraction based on finding many instances of a putative entity or relationship is that it has low recall. To address this, this paper creates HMM models from the contexts of common extractions and uses them to measure the plausibility of rare candidate extractions. Simple idea with good results.
  4. A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing
    J. Gao, G. Andrew, M. Johnson, and K. Toutanova
    Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics  824--831  (2007)
    http://www.aclweb.org/anthology/P/P07/P07-1104
    Bottom line: L_1 regularization of logistic regression does not hurt generalization, and makes the models much smaller. Nice to have a careful study that documents the benefits and limitations of L_1 regularization in a range of common text classification tasks.
  5. Unsupervised Coreference Resolution in a Nonparametric Bayesian Model
    A. Haghighi and D. Klein
    Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics  848--855  (2007)
    http://www.aclweb.org/anthology/P/P07/P07-1107
    A very nice result beautifully presented. The "magic" of Dirichlet processes yields an unsupervised generative model of coreference that competes with supervised methods and can naturally incorporate a discourse model.
  6. K-best Spanning Tree Parsing
    K. Hall
    Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics  392--399  (2007)
    http://www.aclweb.org/anthology/P/P07/P07-1050
    Digging into old directed spanning tree literature continues to bear fruit for dependency parsing, this time a k-best algorithm that can be used for reranking with global features. I have my reservations about reranking, but this is a good addition to the dependency parsing toolbox.
  7. Forest Rescoring: Faster Decoding with Integrated Language Models
    L. Huang and D. Chiang
    Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics  144--151  (2007)
    http://www.aclweb.org/anthology/P/P07/P07-1019
    I like this aproach much better than reranking: evaluate global features as soon as possible and add their score with the local feature score in a dynamic programming parser or decoder, to produce efficiently an approximate set of k-best partial hypotheses. No such method for spanning tree dependency parsers, though... Liang gave a very clear talk.
  8. Exploiting Wikipedia as External Knowledge for Named Entity Recognition
    J. Kazama and K. Torisawa
    Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)  698--707  (2007)
    http://www.aclweb.org/anthology/D/D07/D07-1073
    The basic idea is simple and effective. For each entity described in Wikipedia, find a defining sentence, extract from it heuristically a noun that is likely to be the entity's category, and add that as a "label" feature to other features in a CRF extractor. The details are a bit complicated, but the accuracy improvements make it very worthwhile.
  9. Structured Prediction Models via the Matrix-Tree Theorem
    T. Koo, A. Globerson, X. Carreras, and M. Collins
    Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)  141--150  (2007)
    http://www.aclweb.org/anthology/D/D07/D07-1015
    Several groups discovered concurrently that Tutte's matrix-tree theorem would yield an efficient computation of the normalization for log-linear models of non-projective dependencies. There were three papers on different aspects of this in Prague, one at IWPT which I didn't see, and two at EMNLP. I selected this one because it shows how to cast several learning methods (log-linear and max-margin) into a common framework with very good results. The talk by Terry Koo was clear and convincing.
  10. Mildly Context-Sensitive Dependency Languages
    M. Kuhlmann and M. M\"ohl
    Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics  160--167  (2007)
    http://www.aclweb.org/anthology/P/P07/P07-1021
    The complexity of dependency grammar parsing is related to formal measured os their degree of nonprojectivlty following an approach first introduced for mildly context-sensitive grammars. Dependency grammar is a formal island no longer.
  11. The Infinite PCFG Using Hierarchical Dirichlet Processes
    P. Liang, S. Petrov, M. Jordan, and D. Klein
    Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)  688--697  (2007)
    http://www.aclweb.org/anthology/D/D07/D07-1072
    Another idea that was in the air got two papers in Prague: hierarchical Dirichlet processes for unsupervised PCFG induction. I liked this paper better, as Percy Liang gave a beautifully clear exposition of a method that was pretty opaque in most previous presentations of related work. It must have helped that Percy and Dan Klein had given a tutorial on Bayesian nonparametric models a few days before.
  12. Characterizing the Errors of Data-Driven Dependency Parsing Models
    R. McDonald and J. Nivre
    Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)  122--131  (2007)
    http://www.aclweb.org/anthology/D/D07/D07-1013
    It was rather intriguing at last year's CoNLL evaluation of multilingual dependency parsing that the two top parsers (our MSTParser and Nivre's MaltParser) had overall scores that were statistically indistinguishable, even though they are very different in design. This paper explains the results: MaltParser's greedy deterministic method can use more context and works best on shorter sentences, but greed hurts it on longer sentences. MSTParser uses just local features, so it suffers on shorter sentences, but optimal search makes it do better on longer sentences. How can we combine these benefits? I know, I know, parser combination in the Sagai and Lavie mold can do it, but I'd prefer something more integrated.
  13. Structured Models for Fine-to-Coarse Sentiment Analysis
    R. McDonald, K. Hannan, T. Neylon, M. Wells, and J. Reynar
    Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics  432--439  (2007)
    http://www.aclweb.org/anthology/P/P07/P07-1055
    Combines document-level and sentence-level sentiment classification into a simple, easy to train structured model. Outperforms previous methods significantly at the sentence level, and does competitively at the document level.
  14. Learning Structured Models for Phone Recognition
    S. Petrov, A. Pauls, and D. Klein
    Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)  897--905  (2007)
    http://www.aclweb.org/anthology/D/D07/D07-1094
    Learn the structure of phone models from the data, rather than postulating a fixed structure in advance. Exploit it to represent context dependency concisely. Great paper, excellently presented. I tried to convince some speech colleagues that this could be done over ten years ago, but they were skeptical. It was probably too early, and this paper does it with way better methods than I had then.
  15. Guided Learning for Bidirectional Sequence Classification
    L. Shen, G. Satta, and A. Joshi
    Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics  760--767  (2007)
    http://www.aclweb.org/anthology/P/P07/P07-1096
    Learn a linear sequence model for a problem where exhaustive search is not possible by starting from high-confidence labels and learning which actions to apply to extend the high-confidence regions to a full labeling of the sequence. Best Penn Treebank POS tagging results ever, and the method applies easily to other tagging and parsing problems. Who needs reranking now?
  16. Semi-Supervised Structured Output Learning Based on a Hybrid Generative and Discriminative Approach
    J. Suzuki, A. Fujino, and H. Isozaki
    Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)  791--800  (2007)
    http://www.aclweb.org/anthology/D/D07/D07-1083
    I don't understand this paper fully yet -- the notation and presentation are pretty dense -- but the idea of learning together a CRF from labeled data and HMMs for the same state space from unlabeled data is an intriguing approach to semi-supervised CRF training. One of my post-conference homeworks is to figure our how this does (or not) relate with ASO. Lots of other possible connections, such as Pal and McCallum's multiconditional models.
  17. Randomised Language Modelling for Statistical Machine Translation
    D. Talbot and M. Osborne
    Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics  512--519  (2007)
    http://www.aclweb.org/anthology/P/P07/P07-1065
    I can't quite evaluate this, since I've not been working on large language models recently, but there's something deliciously preverse about using randomized hashing to throw away n-gram data from a big model that doesn't really matter in practice.
  18. Online Learning of Relaxed CCG Grammars for Parsing to Logical Form
    L. Zettlemoyer and M. Collins
    Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)  678--687  (2007)
    http://www.aclweb.org/anthology/D/D07/D07-1071
    Given a training set of sentence-meaning pairs, a CCG-based lexicon induction process discovers potential word meanings and category assignments, and a ranking of alternatives, so that the given training set is correctly analyzed and interpreted. I love this connection between online learning, categorial grammars, and logical semantics, and I think there's a rich vein to explore here.

Sunday, July 1, 2007

Interesting papers at ACL and EMNLP-CoNLL

I just got back from ACL and EMNLP-CoNLL in Prague. There were many interesting papers, more than I could attend because of session conflices. Here are some that I found especially worthwhile. Those highlighted in red really stood out. I'll add comments on some of these papers later, but I don't have time now.

Monday, June 18, 2007

Saturday, June 16, 2007

Frustratingly Hard Domain Adaptation for Parsing

@inproceedings{Dredze07Frustratingly,
author = {Mark Dredze and John Blitzer and Pratha Pratim Talukdar and Kuzman Ganchev and Joao Graca and Fernando Pereira},
title = {Frustratingly Hard Domain Adaptation for Parsing},
booktitle = "Conference on Natural Language Learning",
address = "Prague, Czech Republic"
year = "2007"
}

We will be presenting our struggles with domain adaptation for parsing at CoNLL the week after next in Prague.

Biographies, Bollywood, Boom-boxes, and Blenders: Domain Adaptation for Sentiment Classification

@inproceedings{Blitzer07Biographies,
author = {John Blitzer and Mark Dredze and Fernando Pereira},
title = {Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification},
booktitle = "Association for Computational Linguistics",
address = "Prague, Czech Republic"
year = "2007"
}

We will presenting this at ACL the week after next.

Occam's Hammer

John Langford blogged about Occam's Hammer, which I've started reading. I agree with John that this is an interesting new way of proving tight generalization bounds, which is on my mind because of some papers we submitted for publication recently.