Tagging

  • Tagging by crowd sourcing

    • Hugh market for low-skilled workers, e.g. Amazon Mechanical Turk

    • Skills of annotators are measured and weighted into results

Inter-annotater agreement:

How well do two or more annotators agree on annotation decisions

Taggin guidelines

  • Annotation guidelines describe how tags are to be used

    • E.g. should “the book” be tagged or only “book”, is “Mr Smith” or “Smith” a persons name?

    • To capture all possible cases, these documents can be huge, e.g. here

Exercise

  • Head over to the Doccano test instance

  • Try out the tagging infrastructure

  • What metadata could be useful for tagging?

  • What are interesting POS for the Tucholsky corpus?