Early Stage Researcher (PhD) projects

Individual Research Projects

There are 15 Early Stage Researchers in the CLEOPATRA programme. There are details of their projects below:

1. Fact extraction and cross-lingual alignment (LUH)
2. Interactive user access models to cross-lingual information (LUH)
3. Crowd quality and training in hybrid multilingual information processing and analytics (SOTON)
4. Incentives design for hybrid multilingual information processing and analytics (SOTON)
5. Fact validation across multilingual text corpora (UBO)
6. Interactive multilingual question answering (UBO)
7. Relations of textual and visual information (TIB)
8. Contextualisation of images in multilingual sources (TIB)
9. National and transnational media coverage of European parliamentary elections, 2004-2014 (UoL)
10. Nationalism, internationalism and sporting identity: the London and Rio Olympics/Paralympics (UoL)
11. Information propagation with barriers (JSI)
12. Cross-lingual news reporting bias (JSI)
13. Multilingual Wikipedia as ‘first draft of history’ (UvA)
14. NLP for under-resourced languages (FFZG)
15. Cross-lingual sentiment detection (FFZG)

 

ESR1: Tin Kuculo        PhD enrolment: LUH
Project Title: Fact extraction and cross-lingual alignment (WP4 – Event-centric cross-lingual Information processing)
Objectives: Extract and interlink mentions of related facts and their multilingual context and establish their semantic and temporal relations in comparable corpora by leveraging hybrid computational methods while utilizing NLP and ML-based technologies.
Expected Results: Hybrid computational methods for extraction and interlinking of related facts and their context across languages.
Application: The position has been assigned.

 

ESR2: Sara Abdollahi PhD enrolment: LUH
Project Title: Interactive user access models to cross-lingual information (WP5 – Hybrid computation, user interaction and question answering)
Objectives: Develop user interaction models that enable users to efficiently and effectively access extracted event-centric multilingual information and its context and analyse language-specific differences.
Expected Results: Models for interactive efficient access to structured multilingual information and its context validated through user studies.
Application: The position has been assigned.

 

ESR3: Gabriel Amaral PhD enrolment: SOTON
Project Title: Crowd quality and training in hybrid multilingual information processing and analytics (WP5 – Hybrid computation, user interaction and question answering)
Objectives: Design a mixed-crowdsourcing workflow in which contributors who are less confident in a given language learn the tools of the trade from more experienced ones.
Expected Results: A mixed-crowdsourcing workflow.
I’m a Computer Scientist and Researcher from Ceará, Brazil, with background in Natural Language Processing (NLP), Machine Learning (ML) and Logical Agents. My role in CLEOPATRA is to investigate how Crowdsourcing can prove itself a valuable asset for the enrichment of knowledge-based systems in a multilingual and multicultural environment, as well as for collecting resources to help support minority languages and their speakers.

 

ESR4 PhD enrolment: SOTON
Project Title: Incentives design for hybrid multilingual information processing and analytics (WP5 –  Hybrid computation, user interaction and question answering)
Objectives: Explore a set of motivations and incentives, from financial to learning (pairing people to learn languages in tandem) to challenges and reputation to understand how they would apply to technical, context-rich tasks like in multilingual information science.
Expected Results: Advance state of the art with respect to platforms like Duolingo exploring relatively technical tasks for a lay audience.
Application details: To apply for positions hosted at SOTON please follow the instructions on the official Website.

 

ESR5: Jason Armitage PhD enrolment: UBO
Project Title: Fact validation across multilingual text corpora (WP4 –  Event-centric cross-lingual Information processing)
Objectives: Develop hybrid methods for cross-lingual fact validation and leverage multilingual distributed sources to provide a more complete set of source candidates in order to validate the facts.
Expected Results: Methods for hybrid cross-lingual fact validation using heterogeneous information sources.
Application: The position has been assigned.

 

ESR6: Endri Kacupaj PhD enrolment: UBO
Project Title: Interactive multilingual question answering (WP5 –  Hybrid computation, user interaction and question answering)
Objectives: Train neural networks to convert natural language queries to a formal query language, which will then be answered using existing knowledge bases. Enable efficient user interaction and feedback to enhance results.
Expected Results: End-to-end interactive Question Answering prototype trained using a neural network which will support a more expressive query language and user interaction, in particular to support event-centric questions.
Application details: The position has been assigned.

 

ESR7: Gullal Singh Cheema PhD enrolment: TIB
Project Title: Relations of textual and visual Information (WP4 –  Event-centric cross-lingual Information processing)
Objectives: Develop and research (deep) learning systems that are able to 1.) find the paragraphs and sentences in a text which are relevant to image content, and 2.) predict the granularity and semantic level of text-image relations.
Expected Results: A cross-lingual model of semantic relations of textual and visual information with different levels and granularities.
Application details: The position has been assigned.

 

ESR8: Golsa Tahmasebzadeh PhD enrolment: TIB
Project Title: Contextualisation of images in multilingual sources (WP4 –  Event-centric cross-lingual Information processing)
Objectives: Research how surrounding text information can be utilized to infer and refine spatial and temporal information about an image in multilingual Web sources to support cross-lingual alignment.
Expected Results: Methods to detect temporal and spatial information for images by exploiting visual information and their multilingual textual context.
Application details: The position has been assigned.

 

ESR9: Daniela Major PhD enrolment:  UoL
Project Title: National and transnational media coverage of European parliamentary elections, 2004-2014 (WP6 –  Event-centric cross-lingual analytics and cross-cultural studies)
Objectives: Explore information flows between national media, identify translingual concepts and topics emerging during the elections.
Expected Results: Identification of issues remaining bounded by language or national political cultures in election information flows.
My project concerns the way European media covered the European Parliamentary elections from 2004 to 2019. I’ll be looking at newspapers in digital formats from different countries and languages. I will be studying which themes come up the most and how they change throughout the years as well as the usage of political concepts such as Nationalism, Sovereignty and Europeanisation in the media coverage.

The point of this project is to contribute to a larger understanding of the role of national media in the European Public Sphere as well as measuring the influence of the media in shaping the public’s image of the European Parliament.

Ideally, this project would also provide some pointers as to how the European Institutions should communicate with the electorate thus bridging the gap between these institutions and public opinion.

 

ESR10: Caio Mello PhD enrolment: UoL
Project Title: Nationalism, internationalism and sporting identity: the London and Rio Olympics (WP6 –  Event-centric cross-lingual analytics and cross-cultural studies)
Objectives: Explore online discussion of the two recent Olympics, which took place on different continents, in different time zones, and in different linguistic contexts.
Expected Results: Identification of differences in coverage between nations of major sporting events, analysing factors such as the type of activity, the location of the event, and the languages of the host nations.
The Olympic Games happen every 4 years. This means that every 4 years a city has to be chosen as a host city. It is easy to think about the impact of hosting such a big event in your own country. Usually governments have to prepare everything for their guests and be aware that the local population is expecting something that will remain as a legacy after the event ends. But what are people actually expecting? What usually happens after the Olympics? Are people happy or unhappy with the legacy left behind with the end of the games? We can try to answer these questions by reading what was published on the internet before, during and after the games in these countries that have hosted the Olympics. Of course there are lots of publications about this topic on social media and news media and it would be very difficult to read everything. Because of that, we can use computers to help us to read this material and select important things. Computers can, for example, analyse the most recurrent words related to this topic and provide us with insights about what kind of legacy people usually expect and what are their feelings when they face the materialization of their plans some years later. This kind of research has many possible applications. It can help governments to plan better public policies as well as provide us with tools to understand the impacts of such big events what can be used to find solutions.

 

ESR11 PhD enrolment:  JSI
Project Title: Information propagation with barriers (WP6 –  Event-centric cross-lingual analytics and cross-cultural studies)
Objectives: Model the phenomenon of information propagation within the dynamic network of interconnected events. In other words, the objective is to model the characteristics of information spreading once a physical event happens somewhere in the world.
Expected Results: A model that facilitates tracking how the information about events spreads across languages, borders and cultures including the relations between barriers and the information spreading (e.g. delays, blocks, filters).
Application details: This position has been assigned.

 

ESR12 PhD enrolment: JSI
Project Title: Cross-lingual news reporting bias (WP6 – Event-centric cross-lingual analytics and cross-cultural studies)
Objectives: Analyse cross-lingual news reporting bias along several dimensions: topic, language, geography, political orientation, source, sentiment, time, attention and some other contextual features.
Expected Results: Models describing information consumption in different parts of the world and feature analysis with respect to bias.
Application details: This position has been assigned.

 

ESR13: Anna Katrine Jørgensen PhD enrolment: UvA
Project Title: Multilingual Wikipedia as ‘first draft of history’ (WP6 –  Event-centric cross-lingual analytics and cross-cultural studies)
Objectives: Perform cross-cultural comparison of Wikipedia language versions of articles on emerging news events and their temporal evolution.
Expected Results: Identification of language-specific and community-specific differences across Wikipedia language editions with respect to coverage of emerging news events over time.
Application:  The position has been assigned.

 

ESR14: Diego Alves PhD enrolment: FFZG
Project Title: NLP for under-resourced languages (WP4 –  Event-centric cross-lingual Information processing)
Objectives: Extend Language Processing Pipelines (LPPs) for the well-resourced EU languages and gradually add new languages.
Expected Results: A set of stable LPPs covering the core Language Technology tasks for most of the EU official languages.
Do you remember all the boring syntax analysis and other grammar studies you had to do in High School? Well, computers can do that! And good news, they can be as good as graduate linguists! However, in order to be efficient, computers need a lot of linguistic data and, nowadays, as several automatic methods for natural language processing are available, it is difficult to know which are the best ones. My work consists in finding the optimal combination of softwares to process different European languages, especially the under-resourced ones. 

 

ESR15: Gaurish Thakkar PhD enrolment: FFZG
Project Title: Cross-lingual sentiment detection (WP4 –  Event-centric cross-lingual Information processing)
Objectives: Produce and test a cross-lingual sentiment detection module with support for under-resourced EU languages.
Expected Results: A module for cross-lingual sentiment analysis, integrated in the CKPP.
Application details: The position has been assigned.

Back