Dakota Lemmatizer and Dictionary Expansion Project

Project Details by Fiscal Year
2024 Fiscal Year Funding Amount
$102,000
Fund Source
Arts & Cultural Heritage Fund
Recipient
Dakhota Iapi Okhodakichiye
Recipient Type
Non-Profit Business/Entity
Status
In Progress
Start Date
January 2024
End Date
June 2025
Activity Type
Grants/Contracts
Counties Affected
Yellow Medicine
Yellow Medicine
Project Overview

The Dakota Lemmatizer & Dictionary Expansion Project will begin with Objective 1: Digitize a corpus od approximately 3000 scanned pages of newspapers written in Dakota and Lakota. Complete digitization of these documents will convert each newspaper article from a scanned image of a physical paper to a full text, searchable document. Objective 2: Use dictionary date and corpus date to create a dictionary lemmatizer, making inflected verb forms searchable in the Dakota online dictionary and dictionary app will begin with a comparative analysis between Dakota and Lakota dictionaries, conducted by the Siouan Language Tool Developer. Objective 3: Use data collected in the digitization process to expand the Dakota dictionary will begin with the compilation of sentences, words, and other grammatical data flagged during the earlier digitization and translation phases of the project.

2024 Fiscal Year Funding Amount
$102,000
Direct expenses
$102,000
Proposed Measurable Outcome(s)

Objective 1 Outcomes: 1) Purchase software license for Abbyy Reader OCR and complete training manual 2) Scan and organize approximately 3000 pages of dictionary text 3) Complete digitization, re-transcription and correction of texts at an approximate rate of 250 pages per month 4) Complete translation of texts into modern Dakota orthography (from Missionary Orthography) at an approximate rate of 250 pages per month 5)Prioritize and mark sentences that contain new words or can be used as example sentences in the dictionary 6) De-prioritize data not intended for inclusion in dictionary expansion Objective 2 Outcomes: 1) Begin comparative analysis between Dakota and Lakota Dictionaries for grammatical references 2) Develop adapted Dakota Lemmatizer 3) Create conversion tools from Missionary Orthography to modern Dakota orthography 4) Create functional base version of Lemmatizer 5) Fine tune lemmatizer features 6) Complete final stage of inflection tables 7) Final implementation of lemmatizer build into dictionary app 8) Coordinate Lemmatizer release event 9) Host event and complete updates to the dictionary app and online dictionary database Objective 3 Outcomes: 1) Flag prioritized sentences from the digitization phase for inclusion in the dictionary database 2) Review translations and integrate prioritized examples into the dictionary database 3) Train Dakota Language Specialist/Assistants in audio recording software and data collection for recording events 4) Coordinate and host audio recording event 1 5) Produce audio recordings of prioritized example sentences collected in the digitization phase 6) Coordinate and host audio recording event 2 7) Produce audio recordings of prioritized example sentences collected in the digitization phase 8) Clean and review audio recordings< integrate into the data set 9) Conduct supplemental review for any corrections needed to audio recordings 10) Release dictionary data set for internal testing 11) Finalize audio examples 12) Release database updates to the dictionary app and online dictionary

Recipient Board Members
Joe Bendickson, Ethan Neerdaels, Neil McKay, Wilhelm Meya, Dr. Clifford Canku, Katie Blue-Bendickson, John Reynolds, Raine Cloud
Project Manager
First Name
Ethan
Last Name
Neerdaels
Organization Name
Dakhota Iapi Okhodakichiye
Street Address
5747 Travers Lane
City
Granite Falls
State
MN
Zip Code
56241
Email
admin@dakhota.org
Administered By
Administered by
Location

161 Saint Anthony Ave
Suite 919
St. Paul, MN 55103