Tagging¶
Tagging by crowd sourcing
Hugh market for low-skilled workers, e.g. Amazon Mechanical Turk
Skills of annotators are measured and weighted into results
Inter-annotater agreement:
How well do two or more annotators agree on annotation decisions
Taggin guidelines¶
Annotation guidelines describe how tags are to be used
E.g. should “the book” be tagged or only “book”, is “Mr Smith” or “Smith” a persons name?
To capture all possible cases, these documents can be huge, e.g. here
Exercise¶
Head over to the Doccano test instance
Try out the tagging infrastructure
What metadata could be useful for tagging?
What are interesting POS for the Tucholsky corpus?