Indian Language Dialect Data Annotation Challenge - 2026
Official Guidelines
Click here to register
1. Purpose of the Competition
This competition aims to create high-quality annotated datasets for Indian languages and their dialects to support research and development in NLP, Generative AI, and Responsible AI systems.
2. Eligibility
The competition is open to:
- Engineering students
- Graduates and postgraduates
- Researchers and academicians
- Software developers and test engineers
- Language enthusiasts
3. Important Dates
- Registration Opens: 15th February 2026
- Registration Closes: 15th March 2026
- Competition Period: 15th March 2026 to 15th May 2026
4. Preferred Annotation Languages
- Kannada
- Marathi
- Telugu
- Tamil
- Bengali
5. Annotation Tasks
- Vocabulary Annotation (Lexical-Level Annotation)
- Sentence Annotation (Syntactic/Semantic-Level Annotation)
- Paragraph Annotation (Discourse-Level Annotation)
- Question Answering (NLP Capability)
- Text Summarization (NLP Capability)
- Text Paraphrase Generation (NLP Capability)
- Sentiment Classification (NLP Capability)
- Named Entity Recognition – NER (NLP Capability)
6. Annotation Guidelines
- Annotations must be linguistically accurate and contextually meaningful
- Dialect usage should reflect authentic regional language
- Correct script usage is mandatory
- Machine-generated or copied content is not allowed
- Respectful and neutral language must be maintained
- The Art and Science of Data Annotation for AI
7. Prohibited Content
- Hate speech, abusive, or explicit content
- Political, religious, or ideological material
- Personal or sensitive information
- Copyrighted or plagiarized text
8. Data Usage & Ownership
All submitted annotations will become the intellectual property of NEXTGENVECTORA DATA INNOVATIONS (OPC) Private Limited and will be used strictly for research, educational, and AI model development purposes.
9. Evaluation Criteria
- Accuracy and correctness
- Dialect authenticity
- Consistency of annotations
- Quality and volume of contributions
10. Benefits of Participation
- Hands-on experience in NLP data annotation
- Exposure to real-world AI dataset creation
- Understanding of dialect-based language modeling
- Industry-relevant project experience
11. Certification & Recognition
- A 2-month internship letter recognizing their contribution to the data annotation project
- Participation Certificate for all eligible contributors who complete the required annotations
- Merit Certificates for participants demonstrating high accuracy and consistency
- First Prize - ₹25,000 awarded to the top overall contributor
- Second Prize - ₹15,000 awarded to the second-highest performing contributor
- Third Prize - ₹10,000 awarded to the third-highest performing contributor
- Opportunities for future collaboration and internships with NEXTGENVECTORA DATA INNOVATIONS (OPC) Private Limited
12. Ethical Commitment
This initiative promotes linguistic diversity, responsible AI development, and ethical data collection practices.
13. Disclaimer
NEXTGENVECTORA DATA INNOVATIONS (OPC) Private Limited reserves the right to modify these guidelines at any time. Any violation may result in suspension or termination of access.
14. Sponsorships & Collaborations
NEXTGENVECTORA DATA INNOVATIONS (OPC) Private Limited is happy to collaborate and welcome sponsorships from colleges, universities, research institutions, and industries interested in advancing Indian language technologies, dialect preservation, and responsible AI development.
Highlights for Sponsors:
- Visibility and branding across the official competition platform and communication channels
- Recognition in certificates, reports, and official acknowledgments
- Opportunities for academic–industry collaboration
- Early access to high-quality annotated datasets (as per data usage policies)
- Engagement with skilled student contributors and language experts
- Support a national initiative promoting linguistic diversity and ethical AI