Durham University
Programme and Module Handbook

Postgraduate Programme and Module Handbook 2024-2025

Module COMP42415: Text Mining and Language Analytics

Department: Computer Science

COMP42415: Text Mining and Language Analytics

Type Tied Level 4 Credits 15 Availability Available in 2024/2025 Module Cap None.
Tied to G5K823
Tied to G5K923

Prerequisites

  • None

Corequisites

  • None

Excluded Combination of Modules

  • None

Aims

  • To introduce students to cutting-edge techniques for automated analysis of textual data and their applications

Content

  • Preparation of textual data for machine learning
  • Representation and modelling of textual data
  • Advanced machine learning techniques for natural language analysis
  • Application of natural language analysis techniques within data science e.g. sentiment analysis, social media analysis, text classification and clustering

Learning Outcomes

Subject-specific Knowledge:
  • Upon successful completion of the module, the students will:
  • Have a critical appreciation of how natural language texts can be effectively represented for machine learning
  • Have an advanced understanding of automated natural language analysis through machine learning
  • Understand how natural language analysis can be applied effectively within data science
Subject-specific Skills:
  • Upon successful completion of the module, the students will:
  • Be able to prepare natural language texts for machine learning
  • Be able to train and apply machine learning models based on real textual data
Key Skills:
  • Effective written communication
  • Planning, organising and time-management
  • Problem solving and analysis
  • Reflecting and synthesising from experience

Modes of Teaching, Learning and Assessment and how these contribute to the learning outcomes of the module

  • This module will be delivered by the Department of Computer Science
  • Learning outcomes are met through practical workshops, supported by online resources. The workshops consist of a combination of taught input, group work, case studies, discussion and computing labs. Online resources provide preparatory material for the workshops, typically consisting of directed reading and video content.
  • The summative assessment is an individual written and programming assignment based on the development of a program to analyse a real natural language data set. This is designed to test students' skills in problem identification, their theoretical understanding, and their ability to analyse the situation in order to categorise the potential solutions.
  • The summative assessment requires the design, implementation, analysis, testing, and reporting of Python code to solve specific natural language processing problems. This might consist of programming source code files (Jupyter notebook or Python script) and/or a report of 1500 words max.
  • Teaching on this module will be delivered in a blended mode with specific elements delivered online where student numbers determine online teaching as the most effective method.

Teaching Methods and Learning Hours

Activity Number Frequency Duration Total/Hours
Lectures 8 1 per week (Term 2, weeks 11-18) 1 hour 8
Workshops 8 1 per week (Term 2, weeks 11-18) 2 hours 16
Preparation and Reading 126
Total 150

Summative Assessment

Component: Assignment Component Weighting: 100%
Element Length / duration Element Weighting Resit Opportunity
Coursework 100%

Formative Assessment:

A range of formative assessment methods will be used, including case study based exercises, group presentations and group discussions, and simulation exercises. Oral and written feedback will be provided on an individual and/or group basis as appropriate.


Attendance at all activities marked with this symbol will be monitored. Students who fail to attend these activities, or to complete the summative or formative assessment specified above, will be subject to the procedures defined in the University's General Regulation V, and may be required to leave the University