Durham University
Programme and Module Handbook

Postgraduate Programme and Module Handbook 2020-2021 (archived)

Module COMP42415: Text Mining and Language Analytics

Department: Computer Science

COMP42415: Text Mining and Language Analytics

Type Tied Level 4 Credits 15 Availability Available in 2020/21
Tied to G5K823
Tied to G5K923


  • None


  • None

Excluded Combination of Modules

  • None


  • To introduce students to cutting-edge techniques for automated analysis of textual data and their applications


  • Preparation of textual data for machine learning
  • Representation and modelling of textual data
  • Advanced machine learning techniques for natural language analysis
  • Application of natural language analysis techniques within data science e.g. sentiment analysis, social media analysis, text classification and clustering

Learning Outcomes

Subject-specific Knowledge:
  • Upon successful completion of the module, the students will:
  • Have a critical appreciation of how natural language texts can be effectively represented for machine learning
  • Have an advanced understanding of automated natural language analysis through machine learning
  • Understand how natural language analysis can be applied effectively within data science
Subject-specific Skills:
  • Upon successful completion of the module, the students will:
  • Be able to prepare natural language texts for machine learning
  • Be able to train and apply machine learning models based on real textual data
Key Skills:
  • Effective written communication
  • Planning, organising and time-management
  • Problem solving and analysis
  • Reflecting and synthesising from experience

Modes of Teaching, Learning and Assessment and how these contribute to the learning outcomes of the module

  • This module will be delivered by the Department of Computer Science
  • Learning outcomes are met through classroom-based workshops, supported by online resources. The workshops consist of a combination of taught input, group work, case studies, discussion and computing labs. Online resources provide preparatory material for the workshops – typically consisting of directed reading and video content.
  • The summative assessment is an individual written assignment based on the development of a program to analyse a real natural language data set. This is designed to test students’ skills in problem identification, their theoretical understanding, and their ability to analyse the situation in order to categorise the potential solutions.

Teaching Methods and Learning Hours

Activity Number Frequency Duration Total/Hours
Lectures 8 2 times per week (Term 2, weeks 16-19) 1 hour 8
Workshops 8 2 times per week (Term 2, weeks 16-19 2 hours 16
Surgery 12 3 times per week (Term 2, weeks 16-19) 1 hour 12

Summative Assessment

Component: Assignment Component Weighting: 100%
Element Length / duration Element Weighting Resit Opportunity
Individual written assignment based on the application of techniques to a specific problem 1500 words 100%

Formative Assessment:

A range of formative assessment methods will be used, including case study based exercises, group presentations and group discussions, and simulation exercises. Oral and written feedback will be provided on an individual and/or group basis as appropriate.

Attendance at all activities marked with this symbol will be monitored. Students who fail to attend these activities, or to complete the summative or formative assessment specified above, will be subject to the procedures defined in the University's General Regulation V, and may be required to leave the University