Corpus Center for Advanced Studies & Applications (CCASA)
A Joint Venture of Emerson University Multan, Pakistan & Shanghai International Studies University, China
ABOUT
The Corpus Center for Advanced Studies & Applications(CCASA) delineates a groundbreaking collaboration between the Department of English at Emerson University, Multan, Pakistan, and the Institute of Corpus Studies and Applications (ICSA) at the Shanghai International Studies University (SISU), China.
In essence, the project comprises multiple research initiatives housed within the framework of this envisioned Corpus Center. The memorandum of understanding (MOU) between these universities encompasses all members at both centers, fostering an expansion of resource bases and mutual benefits.
Contact info
Greetings from the Corpus Center Team!
About The Corpus Center
The Corpus Center for Advanced Studies & Applications(CCASA) delineates a groundbreaking collaboration between the Department of English at Emerson University, Multan, Pakistan, and the Institute of Corpus Studies and Applications (ICSA) at the Shanghai International Studies University (SISU), China.
In essence, the project comprises multiple research initiatives housed within the framework of this envisioned Corpus Center. The memorandum of understanding (MOU) between these universities encompasses all members at both centers, fostering an expansion of resource bases and mutual benefits.
The collaboration spans diverse domains, including research, development, education, training, and the long-term propagation of knowledge, emphasizing the university’s pivotal role in knowledge transfer and information sharing. The areas of mutual interest for collaboration are broad, spanning contemporary networking techniques, reciprocity enhancement, and the dissemination of investigative capacity, new methodologies, and technologies.
The project aligns with the “Languages Documentation Program,” aiming to maximize the reuse and secondary use of audiovisual, annotated language and textual data. The Corpus Center will establish quality standards and curation criteria for various reuse scenarios, such as ‘Language Documentation,’ ‘Learner Corpora,’ ‘Interpreted Corpora,’ ‘Sign Language,’ ‘Language Community,’ ‘Ethnography,’ and ‘Oral History.’ This initiative seeks to enhance linguistic research capacity in Language Data Science and Applications, Humanities/Social Sciences, Linguistics and Literature, and English Language Teaching/Learning.
In the dynamic landscape of competitive academia, Emerson University Multan recognizes the imperative to enhance its research program offerings, regional competitiveness, cost efficiency, and the robustness of its research initiatives, aligning with the burgeoning emphasis on Corpus Studies. In response to the evolving academic landscape, Emerson University, Multan, has initiated the establishment of corpus centers, amplifying the scope of research at advanced levels. With only a few universities currently offering corpus centers in Pakistan, considerable potential exists for attracting a substantial number of students and researchers to engage in various programs related to corpus studies at Emerson University Multan. This trend aligns with the evolving academic landscape and positions Emerson University Multan as a frontrunner in catering to the increasing demand for advanced research in corpus studies.
Objective:
The Corpus Center aims to:
- Excel in research through diverse research activities.
- Create opportunities for funded research projects in collaboration with municipal, national, and international organizations.
- Serve as an authentic source for storing spoken and written language data for communicative and research purposes.
- Develop the orthography of unwritten languages.
- Preserve endangered languages and revitalize those near extinction.
- Propose innovative language teaching methods and create appropriate materials.
- Establish frameworks for teaching/learning grammar and vocabulary.
- Pave the way for computational linguistics in Pakistan.
- Develop spoken language data corpora.
- Innovate search methods and data utilization for research.
- Develop advanced software and web applications for natural language and speech processing, fostering the systematic growth of corpora and related fields.
Salient Features:
The corpus center is designed to support big data and corpus analysis tools in research, teaching, and learning. Noteworthy features include:
- Collaboration across corpus linguistics, natural language processing, translation studies, and other interdisciplinary domains.
- A focus on the intersection of psycholinguistics, cognitive linguistics, literary stylistics, and statistics within corpus linguistics at EUM.
- Access to diverse corpora, a dedicated computer suite with specialized resources, and an eye-tracking laboratory.
- Contribution to shaping Digital Humanities developments, including developing the web app for corpus linguistic studies of literary texts.
- Organization of conferences, summer schools, workshops, seminars, and symposiums on corpus studies.
- Sharing of corpus methods and training researchers in various tools like Praat, Elan, Wordsmith, Wmatrix, etc.
- Organization of conferences, summer schools, workshops, seminars, and symposiums on corpus studies.
- Sharing of corpus of conferences, summer schools, workshops, seminars, and symposiums on corpus studies.
FUTURE RESOURCE PROVISION
The successful execution of this pioneering project necessitates a well-equipped laboratory and the utilization of diverse corpus tools. The following components outline the specific resource requirements:
Command Line Tools and Scripting:
- For beginners, gaining familiarity with basic command-line literacy and a scripting language like Python is highly recommended.
- Resources for initial learning include Chris Pott’s Programming for Linguists class materials and introductory workshops from Stanford Library’s Center for Interdisciplinary Digital Research (CIDR), Software Carpentry, and Data Carpentry.
- CIDR additionally offers one-on-one consulting.
Natural Language Processing (NLP):
- Initiating natural language processing involves options such as the Natural Language Toolkit (NLTK), the Text Mining with R book, and the tidytext R package.
- Further exploration of text processing, including sentiment analysis, can be facilitated with tools like spaCy, Stanford CoreNLP, and NLP-related packages in R.
- Further exploration of text processing, including sentiment analysis, can be facilitated with tools like spaCy, Stanford CoreNLP, and NLP-related packages in R language.
Speech Processing:
- For delving into speech processing, resources such as Will Styler’s Using Praat for Linguistic Research, Joey Stanley’s Praat scripting tutorial, and Eleanor Chodroff’s A Corpus Phonetics tutorial are recommended.
- Once basic familiarity is acquired, tools like Montreal Forced Aligner, SpeechBrain, and Kaldi are standard for tasks like forced alignment.
- For delving into speech processing, resources such as Will Styler’s Using Praat for Linguistic Research
In synthesizing these resources, the Corpus Center at Emerson University Multan aims to foster a robust research environment, merging linguistic expertise with cutting-edge technological tools to advance language, literature, and computational linguistics knowledge.
CORPUS TEAM
Team A: Emerson University Multan, Pakistan
Team B: Shanghai International Studies University, China
Aims & Objectives of the CORPUS Center
The Corpus Center at Emerson University Multan strives to achieve multifaceted objectives that contribute to advancing research, scientific discourse, and knowledge exchange within the academic community.
Research Support:
The Corpus Center serves as a cornerstone for facilitating and supporting research activities undertaken by its members. The Academic Director, with his expertise as an Associate Professor in the Department of English, fosters an environment conducive to scholarly inquiry within the faculties of English (Language & Literature) and Computer Sciences.
Scientific Dialogue:
Promoting scientific dialogue is a vital mandate of the Corpus Center. The aim is to create new synergies, both on an international and local scale, fostering collaborative endeavors that transcend disciplinary boundaries.
Knowledge Exchange:
At the heart of the Center’s mission is promoting knowledge exchange. It actively encourages sharing academic findings among its members, creating a platform for intellectual discourse and collaboration. Particular emphasis is placed on developing applied research protocols, using corpus-based linguistic analysis programs, and establishing linguistic and terminology databases.
Academic Community Hub:
The Corpus Center is a vibrant space for sharing insights and expertise among academic staff, graduates, and upcoming graduates. It maintains a dynamic connection with the broader academic community, striving to enhance both research methodologies and teaching practices.
Continuous Improvement:
The Center remains in constant touch with academia, seeking opportunities for continuous improvement in research and teaching methods. The Academic Director plays a pivotal role in ensuring that the Center remains at the forefront of advancements in linguistics and computational sciences.
Under the leadership of the Vice Chancellor and Academic Director, the Corpus Center at Emerson University Multan emerges as a dynamic hub, propelling the frontiers of linguistic research, fostering collaboration, and nurturing a culture of knowledge exchange for the betterment of academia and society at large.
Research Focus Areas
The Center will actively engaged in academic development-oriented research across diverse domains, fostering advancements in the following key areas:
Language Data Infrastructure Processing and Applications:
Exploring the intricacies of language data infrastructure processing, the Center delves into applications that enhance the understanding and utilization of linguistic data in various contexts.
Corpus-Based Sociolinguistics (Specialized and Sectorial Variation):
Addressing the nuances of sociolinguistic research, the Center specializes in analyzing specialized and sectorial variations within corpora, contributing to a deeper understanding of language dynamics.
Corpus-Based Acquisitional Linguistics (L2 Variation):
In the realm of acquisitional linguistics, the Center focuses on L2 variation within corpora, shedding light on the intricacies of language acquisition and evolution.
Corpus-Based Discourse and Communication Analysis:
Analyzing discourse and communication through the lens of corpus-based methods, the Center seeks to uncover patterns and insights that inform our understanding of language use in diverse communicative contexts.
Parallel and Comparable Corpus-Based Translation Studies:
The Center engages in translation studies utilizing parallel and comparable corpora, exploring the complexities of translation processes and outcomes across languages.
Translation Memories and Other Databases:
A dedicated area of research involves the exploration of translation memories and other databases, contributing to developing robust linguistic resources for translation professionals.
Corpus-Based Terminology and Phraseology:
In terminology and phraseology, the Center conducts corpus-based studies to unravel the intricacies of language use in specialized fields, enhancing terminological precision.
AI and Machine Translation Development:
Pioneering research in artificial intelligence (AI) and machine translation development, the Center contributes to the evolution of cutting-edge technologies that shape the future of language processing and translation.
In these diverse research fields, the Center at Emerson University Multan remains at the forefront of academic exploration, fostering innovation and contributing valuable insights to the broader academic community and beyond.
BACKGROUND
The application of technology to linguistic analysis signifies a pivotal advancement with direct implications for didactics across various domains. The immediate applications extend to interpreting, translating, language studies, terminology, and applied linguistics studies, including sociolinguistics, contact linguistics, textual linguistics, and pragmatic linguistics.
Corpus linguistic approaches, driven by the computational power available today, offer a transformative lens for scrutinizing vast language datasets. These datasets, known as corpora, serve as dynamic reservoirs enabling the exploration of word meanings within specific contexts. The semiautomated nature of corpus linguistic investigation equips researchers to unveil and comprehend intricate language patterns that might remain elusive through manual analysis alone. We have illustrated the diverse applications of corpus linguistic approaches through a set of concise case studies conducted by researchers in higher education. These case studies traverse various educational settings, academic disciplines, and genres, highlighting the versatility of corpus linguistics in addressing research questions across diverse domains.
Contemporary linguistics delves into the evolution of corpus linguistics as a robust field of inquiry, underscoring the significance of existing open corpora and sophisticated corpus analysis tools. By doing so, we provide context to the case studies presented, showcasing the broader landscape within which corpus linguistic techniques operate. Through these case studies, we offer a glimpse into how corpus linguistic methodologies, whether employed in isolation or integrated into a broader research framework, prove instrumental for higher education researchers. Investigating language data and its contextual nuances is a powerful tool that can significantly enhance the depth and breadth of research inquiries within the higher education landscape. As corpus linguistics continues to evolve, it is a valuable ally for researchers seeking nuanced insights into language usage across various academic contexts.
LATEST NEWS
First Online International Symposium
Corpus linguistic approaches, driven by the computational power available today, offer a transformative lens for scrutinizing vast language datasets. These datasets, known as corpora, serve as dynamic reservoirs enabling the exploration of word meanings within specific contexts. The semiautomated nature of corpus linguistic investigation equips researchers to unveil and comprehend intricate language patterns that might remain elusive through manual analysis alone. We have illustrated the diverse applications of corpus linguistic approaches through a set of concise case studies conducted by researchers in higher education. These case studies traverse various educational settings, academic disciplines, and genres, highlighting the versatility of corpus linguistics in addressing research questions across diverse domains.
Contemporary linguistics delves into the evolution of corpus linguistics as a robust field of inquiry, underscoring the significance of existing open corpora and sophisticated corpus analysis tools. By doing so, we provide context to the case studies presented, showcasing the broader landscape within which corpus linguistic techniques operate. Through these case studies, we offer a glimpse into how corpus linguistic methodologies, whether employed in isolation or integrated into a broader research framework, prove instrumental for higher education researchers. Investigating language data and its contextual nuances is a powerful tool that can significantly enhance the depth and breadth of research inquiries within the higher education landscape. As corpus linguistics continues to evolve, it is a valuable ally for researchers seeking nuanced insights into language usage across various academic contexts.