Marco Tagliasacchi's research


Research projects

Year Acronym Funding scheme Title Role
2012-2015 GreenEyes FP7 FET-Open Young-Explorers Networked energy-aware visual analysis Coordinator
2011-2014 REWIND FP7 FET-Open REVerse engineering of audio-Visual content Data Co-coordinator
2009-2010 LAURA Local project Localization And Ubiquitous monitoRing of pAtients for health care support Co-coordinator
2014-2016 SmartH20 FP7 STREP an ICT Platform to leverage on Social Computing for the efficient management of Water Consumption Scientific investigator
2014-2015 Proactive Regional project PROtezione del territorio con infrAstrutture iCT avanzate, cIttadinanza attiVa, e rEti sociali Scientific investigator
2011-2014 CUBRIK FP7 IP Human-enhanced time-aware multimedia search Scientific investigator
2010-2011 ARACHNE PRIN national project Advanced video stReaming ArCHitectures for peer-to-peer Scientific investigator
2009-2011 SeCo ERC Advanced Grant Search Computing Scientific Investigator
2009-2011 SCENIC FP7 FET Open Self configuring environment aware intelligent acoustic sensing Scientific investigator
2006-2009 VISNET-II FP6 NoE Networked audiovisual media technologies Scientific investigator
2005-2006   PRIN national project Robust video coding techniques based on Distributed Source Coding Scientific investigator
2004-2006 VISNET FP6 NoE Networked audiovisual media technologies Scientific investigator


Research outline

My research activities are focused on the processing of multimedia data (e.g. audio, image and video signals), backed by the principles of information theory and machine learning. On one side, information theory enables a compact, non-redundant, representation of data; on the other side machine learning algorithms are adopted to extract the content-based semantics from multimedia data.

Image and video processing

Image and video forensics[C.121.,C.118., C.116., C.115., C.111., C.110., C.105., C.103., C.101., C.100., C.99., C.96., C95., C.94., C.89., C.88., C.87., A.21., C.86., C.82., A.18., C.79., C.75., C.74., A.9., A.7., C.71., C.66., C.64., C.59., C.54., C.47., C.42.]

With the rapid proliferation of inexpensive acquisition and storage devices multimedia objects can be easily created, stored, transmitted, modified and tampered with by anyone. During its lifetime, a digital object might go through several processing stages, including multiple analog-to-digital (A/D) and digital-to-analog (D/A) conversions, coding and decoding, transmission, editing (either aimed at enhancing the quality, creating new contents mixing pre-existing materials, or tampering with the content). We are aiming at synergistically combining principles of signal processing, machine learning and information theory to answer relevant questions on the past history of such objects. We started from the observation that each of these processing steps necessarily leaves a characteristic footprint, which can be potentially detected to trace back the past history of the available multimedia object in a blind fashion, i.e. without having access to the original content. We are focusing on detecting acquisition-, coding- and editing-based footprints in images and videos with impact in forensic and law enforcement applications.

Visual sensor networks [A.24., A.23., C.114., C.113., C.112., C.107., C.104., C.102., C.98., C.97., C.91., C.90., C.83., C.81., C.80., A.11., C.52.]

We aim at developing new methodologies, practical algorithms and protocols, to empower WSNs with vision capabilities similar to those achievable by power-eager smart camera systems. The key tenet is that most visual analysis tasks can be carried out based on a succinct representation of the image, which entails both global and local features, while it disregards the underlying pixel-level representation. We are tackling the problem by proposing a novel joint analyze-and-compress paradigm. That is, image features are collected by sensing nodes, are processed, and are then delivered to the final destination(s) in order to enable higher level visual analysis tasks by means of either centralized or distributed detectors and classifiers, somewhat mimicking the processing of visual stimuli in the early visual system. The current activities are rooted in our previous work, in which we showed how to perform video analysis without the need for reconstructing the video sequence beforehand, exploiting compressed sensing.


Information retrieval

Crowdsourcing for multimedia retrieval [C.117., C.109., C.108., C.106., C.85., C.77., C.76.]

We aim at improving the performance of commonly used multimedia content analysis algorithms by properly merging automatic decisions with human-computed results, thus realizing a more integrated collaboration between human judgement and algorithms. In particular, we are currently studying the problem of reducing the uncertainty in ranked result sets with human computation and how to optimally allocate human workers to tasks.

Top-k query processing [A.22., A.20., A.17., A.15., C.84., C.79., C.78., C.72., C.69., C.67., A.13., A.12., B.4., C.51., C.58.]

In many information retrieval tasks, the user is interested only in the top-k results. We have been studying different aspects of top-k query processing that arise when joining heterogeneous services to answer complex queries. In particular, we studied: i) the optimization of the access plan to fetch data from the individual services; ii) how to deal with uncertainty in the scoring function; iii) how to diversify search results, so as to return items that are both relevant and diverse.

Past Research Activities

Video Processing

Video quality assessment [A.16., A.14., A.10., A.8., C.61., C.57., C.55., C.50., C.48., C.46., C.45.]

Video data represents the large part of the traffic on the Internet. Therefore, there is a strong demand for automatic mechanisms able to evaluate the playout quality of video sequences at the clients. This is especially important when video streams are transmitted over best-effort networks and the quality of service cannot be guaranteed. We have investigated the problem of video quality assessment both in a no-reference (i.e. the original video sequence is not available) and in a reduced-reference (i.e. a small size auxiliary stream accompanies the main video stream) scenario. We have developed objective quality metrics that are well correlated with the perceptual quality of impaired video. We have made available to the research community the data collected during an extensive subjective evaluation campaign.

Distributed video coding [A.5., A.3., A.1., C.44., C.34., C.33., C.28., C.26., C.24., C.23., C.22., C.21., C.17., C.16., C.15., C.14., C.13., C.12., C.10., C.9., C.8., C.7., C.6.]

Distributed video coding is a recent coding paradigm that enables a flexible distribution of the computational complexity between encoder and decoder, by moving part of the motion estimation task at the decoder. The research has focused on several aspects related to distributed video coding: improving the coding efficiency of state-of-the-art coding architectures, removing some issues that prevented such coding architectures from being applied in practical scenarios; studying the rate-distortion performance of distributed video coding and comparing it with conventional motion-compensated predictive codecs. The research activities have also addressed how to exploit distributed video coding to enhance the robustness with respect to packet losses.

Non-normative tools for video coding [A.6., A.4., B.2., B.1., C.38., C.36., C.35., C.32., C.18., C.11., C.2.]

In order to ensure interoperability, video coding standards define only the syntax of the bitstream and how to perform decoding. Several components are not specified by the standards, including motion estimation, rate allocation, rate control, error concealment, etc. The research has focused on non-normative tools for the state-of-the-art H.264/AVC video coding standard, with particular emphasis on error resilience and rate control.

Scalable video coding [A.2., C.5., C.4., C.3.]

When video contents are distributed over heterogeneous networks and devices it is desirable to adapt the bitstream to the characteristics of the receiving device. Scalable video coding enables bitstream adaptation without the need of transcoding, i.e. partial decoding followed by re-encoding. The bitstream corresponding to the desired frame-rate, spatial resolution and quality can be readily extracted from the original bitstream. The research has focused on wavelet-based scalable video coding techniques, somewhat extending the ideas of JPEG2000 to video signals, and it has led to several contribution to the MPEG, involved in the standardization of a scalable video codec.

Audio Processing

The research activities on audio processing are carried out at the premises of the Sound and Music Computing lab (Como Campus – Politecnico di Milano), where I coordinate, together with Prof. Sarti, a research group of five people, including Ph.D. students and research assistants.

Self-calibration of acoustic cameras [C.63., C.56., C.53.]

Working with multiple microphone arrays requires knowing the relative positioning of each array in the 3D space. By exploiting concepts from the computer vision literature, we have defined the notion of acoustic camera, and addressed the problem of self-calibrating multiple acoustic cameras while minimizing the amount of data exchange between each camera. Both far-field and near-field conditions have been addressed.

Acoustic source localization and tracking [C.41., C.40., C.39., C.30., C.29., C.27., C.25., C.20., C.19.]

The information about the type of acoustic event can be augmented by the location of the source by space-time processing of signals collected with microphone arrays. We have been working on the problem of acoustic source localization and tracking, especially when more than one source is active at the same time.

Audio classification [C.31., C.27.]

The goal of this research line is to detect the onset of anomalous events (e.g. gunshots, screams, etc.) in audio streams collected by environmental microphones. The research is proceeding towards modelling the temporal evolution of acoustic features extracted from the audio streams, in order to detect aggressions in public spaces for security applications.


Data analysis algorithms in gene annotation databases [B.3., C.49., C.60.]

Gene annotation databases are widely used as public repositories of biological knowledge. Gene and gene products are annotated with terms taken from unstructured controlled vocabularies or semantically structured ontologies (e.g. the Gene Ontology). We are developing a system which is meant to integrate available data sources providing functional annotations of genes and gene products. In this context, we have developed novel algorithms for automatically predicting newly inferred annotations based on the functional similarity between Gene Ontology terms.