Beta's projects for MSc students 2015—2016

General

Sequential associations

1 student, maximum 5 groups | Research project | Streams: General, Information Security, Multimedia Computing, Financial Computing | Updated 2015-10-30[5]

Description

Events often happen with a cause. A cause may lead to a effect, but not always for certain. Sometimes, it is hard to say which event is the cause, and which is the effect. The events can be just correlated and are affected by the same cause, or they just happen together by chance.

When many Hang Seng Index (HSI) constituents rise, the Hang Seng Index tends to go up. Getting straight A's in exams may lead you to get a degree with distinction. When the subtropical ridge of high pressure area is more to the West, more typhoons are expected to enter the South China Sea [1]. Rise of some stock prices may cause the rise of the prices of commodities, and the most talked-about stocks on social media are the ones with greatest fluctuation. Some of these chages are immediate (e.g., HSI), some delayed (e.g., distinction). The cause of some delayed events may not be obvious (e.g., typhoon), and some seemingly correlated events may not even be correlated (e.g., commodity, forum).

The project is about studying and applying algorithms on sequences of data to find out how they are associated with each other. The student is going to choose and acquire sequences of data they are interested in, and study and apply statistical techniques, sequential data mining algorithms, temporal classification methods to find out how the sequences are associated. Applications include generating academic advices (distinction case), prediction of future stock index values (HSI case), prediction of the range of the number of typhoons entering South China Sea (typhoon case), or prediction of stock prices given market information (commodity and forum cases).

[1] Why Tropical Cyclone Recurves? PAN Chi-kin; Hong Kong Observatory 2011-09.

Requirements

Deliverables

Bird Song Recognizer

1 student, maximum 3 groups | Development project | Streams: General, Multimedia Computing | Updated 2015-10-30[5]

Description

Build a system that recognizes song of birds.

Note that the student is expected to build their own collection of training and testing data. HKBWS Bird Call page is a good starting point. There are CDs of bird calls in the market as well.

Experimentation using systems such as Audacity, PureData, Octave, Mathematica, or Matlab is expected.

The system is best implemented in an operating-system independent way.

Requirements

Deliverables

Speaker Recognizer

1 student, maximum 2 groups | Development project | Streams: General, Multimedia Computing | Updated 2015-10-30[5]

Description

Build a system that recognizes and labels the speakers in a recorded radio talk show or phone-in show.

Preferably, the system should be language-indpendent.

A more advanced version of the system should be able to run in stream mode and takes live speech, generating output or as the input is analyzed, and possibly correcting earlier outputs when necessary.

Note that the student is expected to build their own collection of training and testing data.

Experimentation using systems such as Audacity, PureData, Octave, Mathematica, or Matlab is expected.

The system is best implemented in an operating-system independent way.

Requirements

Deliverables

Distributed Financial data forecaster

1 student, maximum 2 groups | Research project | Streams: General, Financial Computing | Updated 2015-10-30[5]

Description

Given the historical data of a number of stock or index prices at different points in time, design an algorithm that would predict their values in the future.

Some factors the algorithm can take into the consideration include the day in month, weekday of day, time of day, various financial indicators, correlations between data from different time series.

Note that the student is expected to build their own collection of training and testing data.

The prediction system should be run and tested in a distributed environment, using packages and libraries such as MLLib on PySpark, Hadoop, or MPI.

Also, note that since the non-distributed version of this project has been done in the past, students taking it up should show how their approaches are better than past approaches.

Requirements

Deliverables