Corpus Center for Advanced Studies & Applications (CCASA)

A Joint Venture of Emerson University Multan, Pakistan & Shanghai International Studies University, China

ABOUT

The Corpus Center for Advanced Studies & Applications(CCASA) delineates a groundbreaking collaboration between the Department of English at Emerson University, Multan, Pakistan, and the Institute of Corpus Studies and Applications (ICSA) at the Shanghai International Studies University (SISU), China.

In essence, the project comprises multiple research initiatives housed within the framework of this envisioned Corpus Center. The memorandum of understanding (MOU) between these universities encompasses all members at both centers, fostering an expansion of resource bases and mutual benefits.

Greetings from the Corpus Center Team!

FUTURE RESOURCE PROVISION

The successful execution of this pioneering project necessitates a well-equipped laboratory and the utilization of diverse corpus tools. The following components outline the specific resource requirements:

Command Line Tools and Scripting:

For beginners, gaining familiarity with basic command-line literacy and a scripting language like Python is highly recommended.
Resources for initial learning include Chris Pott’s Programming for Linguists class materials and introductory workshops from Stanford Library’s Center for Interdisciplinary Digital Research (CIDR), Software Carpentry, and Data Carpentry.
CIDR additionally offers one-on-one consulting.

Natural Language Processing (NLP):

Initiating natural language processing involves options such as the Natural Language Toolkit (NLTK), the Text Mining with R book, and the tidytext R package.
Further exploration of text processing, including sentiment analysis, can be facilitated with tools like spaCy, Stanford CoreNLP, and NLP-related packages in R.
Further exploration of text processing, including sentiment analysis, can be facilitated with tools like spaCy, Stanford CoreNLP, and NLP-related packages in R language.

Speech Processing:

For delving into speech processing, resources such as Will Styler’s Using Praat for Linguistic Research, Joey Stanley’s Praat scripting tutorial, and Eleanor Chodroff’s A Corpus Phonetics tutorial are recommended.
Once basic familiarity is acquired, tools like Montreal Forced Aligner, SpeechBrain, and Kaldi are standard for tasks like forced alignment.
For delving into speech processing, resources such as Will Styler’s Using Praat for Linguistic Research

In synthesizing these resources, the Corpus Center at Emerson University Multan aims to foster a robust research environment, merging linguistic expertise with cutting-edge technological tools to advance language, literature, and computational linguistics knowledge.

CORPUS TEAM

Team A: Emerson University Multan, Pakistan

Patron

Vice Chancellor, Emerson University Multan, Pakistan

Dr. Adnan Tahir

Director

Associate Professor/ Chairman, Department of English, Emerson University Multan, Pakistan.

Faculty & Supporting Faculty

Department of English & Department of Computer Science

Team B: Shanghai International Studies University, China

Prof. HU Kaibao

Professor, Dean, Assistant President, Institute of Corpus Studies and Applications, Shanghai International Studies University, China

Dr. Muhammad Afzaal

Associate Professor, Institute of Corpus Studies and Applications, Shanghai International Studies University, China

Prof. Geng Qiang

Professor, Vice Dean, Institute of Corpus Studies and Applications, Shanghai International Studies University, China

Aims & Objectives of the CORPUS Center

The Corpus Center at Emerson University Multan strives to achieve multifaceted objectives that contribute to advancing research, scientific discourse, and knowledge exchange within the academic community.

Research Support:

The Corpus Center serves as a cornerstone for facilitating and supporting research activities undertaken by its members. The Academic Director, with his expertise as an Associate Professor in the Department of English, fosters an environment conducive to scholarly inquiry within the faculties of English (Language & Literature) and Computer Sciences.

Scientific Dialogue:

Promoting scientific dialogue is a vital mandate of the Corpus Center. The aim is to create new synergies, both on an international and local scale, fostering collaborative endeavors that transcend disciplinary boundaries.

Knowledge Exchange:

At the heart of the Center’s mission is promoting knowledge exchange. It actively encourages sharing academic findings among its members, creating a platform for intellectual discourse and collaboration. Particular emphasis is placed on developing applied research protocols, using corpus-based linguistic analysis programs, and establishing linguistic and terminology databases.

Academic Community Hub:

The Corpus Center is a vibrant space for sharing insights and expertise among academic staff, graduates, and upcoming graduates. It maintains a dynamic connection with the broader academic community, striving to enhance both research methodologies and teaching practices.

Continuous Improvement:

The Center remains in constant touch with academia, seeking opportunities for continuous improvement in research and teaching methods. The Academic Director plays a pivotal role in ensuring that the Center remains at the forefront of advancements in linguistics and computational sciences.

Under the leadership of the Vice Chancellor and Academic Director, the Corpus Center at Emerson University Multan emerges as a dynamic hub, propelling the frontiers of linguistic research, fostering collaboration, and nurturing a culture of knowledge exchange for the betterment of academia and society at large.

Research Focus Areas

The Center will actively engaged in academic development-oriented research across diverse domains, fostering advancements in the following key areas:

Language Data Infrastructure Processing and Applications:

Exploring the intricacies of language data infrastructure processing, the Center delves into applications that enhance the understanding and utilization of linguistic data in various contexts.

Corpus-Based Sociolinguistics (Specialized and Sectorial Variation):

Addressing the nuances of sociolinguistic research, the Center specializes in analyzing specialized and sectorial variations within corpora, contributing to a deeper understanding of language dynamics.

Corpus-Based Acquisitional Linguistics (L2 Variation):

In the realm of acquisitional linguistics, the Center focuses on L2 variation within corpora, shedding light on the intricacies of language acquisition and evolution.

Corpus-Based Discourse and Communication Analysis:

Analyzing discourse and communication through the lens of corpus-based methods, the Center seeks to uncover patterns and insights that inform our understanding of language use in diverse communicative contexts.

Parallel and Comparable Corpus-Based Translation Studies:

The Center engages in translation studies utilizing parallel and comparable corpora, exploring the complexities of translation processes and outcomes across languages.

Translation Memories and Other Databases:

A dedicated area of research involves the exploration of translation memories and other databases, contributing to developing robust linguistic resources for translation professionals.

Corpus-Based Terminology and Phraseology:

In terminology and phraseology, the Center conducts corpus-based studies to unravel the intricacies of language use in specialized fields, enhancing terminological precision.

AI and Machine Translation Development:

Pioneering research in artificial intelligence (AI) and machine translation development, the Center contributes to the evolution of cutting-edge technologies that shape the future of language processing and translation.

In these diverse research fields, the Center at Emerson University Multan remains at the forefront of academic exploration, fostering innovation and contributing valuable insights to the broader academic community and beyond.

The application of technology to linguistic analysis signifies a pivotal advancement with direct implications for didactics across various domains. The immediate applications extend to interpreting, translating, language studies, terminology, and applied linguistics studies, including sociolinguistics, contact linguistics, textual linguistics, and pragmatic linguistics.

Corpus linguistic approaches, driven by the computational power available today, offer a transformative lens for scrutinizing vast language datasets. These datasets, known as corpora, serve as dynamic reservoirs enabling the exploration of word meanings within specific contexts. The semiautomated nature of corpus linguistic investigation equips researchers to unveil and comprehend intricate language patterns that might remain elusive through manual analysis alone. We have illustrated the diverse applications of corpus linguistic approaches through a set of concise case studies conducted by researchers in higher education. These case studies traverse various educational settings, academic disciplines, and genres, highlighting the versatility of corpus linguistics in addressing research questions across diverse domains.

Contemporary linguistics delves into the evolution of corpus linguistics as a robust field of inquiry, underscoring the significance of existing open corpora and sophisticated corpus analysis tools. By doing so, we provide context to the case studies presented, showcasing the broader landscape within which corpus linguistic techniques operate. Through these case studies, we offer a glimpse into how corpus linguistic methodologies, whether employed in isolation or integrated into a broader research framework, prove instrumental for higher education researchers. Investigating language data and its contextual nuances is a powerful tool that can significantly enhance the depth and breadth of research inquiries within the higher education landscape. As corpus linguistics continues to evolve, it is a valuable ally for researchers seeking nuanced insights into language usage across various academic contexts.