IDAL is focused on applying innovative techniques to deal with a variety of data analysis problems. IDAL expertise comprises most of the fields in machine learning. The main areas where IDAL was an active research are the following:
Machine learning analyzes the development of models based on data to solve problems of modelization, classification, optimization and forecasting. A classical definition is (Tom Mitchell) :
“The field of Machine Learning seeks to answer the question “How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?”. To be more precise, we say that a machine learns with respect to a particular task T, performance metric P, and type of experience E, if the system reliably improves its performance P at task T, following experience E. Depending on how we specify T, P, and E, the learning task might also be called by names such as data mining, autonomous discovery, database updating, programming by example, etc.”
Machine learning models are neural networks, decision trees, random forest, probabilistic graphical models, fuzzy logic, deep learning, support vector machines, deep learning, clustering algorithms, etc. These models are used in problems where nonlinear effects appear (which are mostly in real applications). Now, it has increased the demand for this kind of models for the “data explosion” that involves terms like Big Data, Internet of everything and Social Networks.
Generally the goal of reducing the dimensionality in a machine learning problem is either 1) An optimization of classifiers or regressors reducing the number of inputs or 2) the knowledge extraction about the minimum number of variables, and their physical meaning in order to reduce the number of variables to be acquired, thus to solve a given problem: Feature selection and feature extraction methods. On one hand, feature selection methods verify both objectives above mentioned, and consists in choosing a subset of relevant features in a given problem. Hence the subset of variables in addition to having lower dimensionality than the original set, have physical meaning and may be interpreted. On the other hand, feature extraction methods just verify the first objective, reducing the dimensionality of a machine learning problem working in a transformed space obtained from the all original variables. In other words, The feature extraction problem is basically the mapping between two vector spaces with different dimensionality (generally the second has lower dimensionality than the first), as long as the neighbourhood relationships were maintained. Feature extraction methods belong of the manifold learning field of knowledge.
Data mining is an essential step in the process of knowledge discovery. The interesting patterns are presented to the user and may be stored as new knowledge in the knowledge base.
Data mining is one step in the knowledge discovery process, albeit an essential one because it uncovers hidden patterns for evaluation. However, in industry, in media, and in the research milieu, the term data mining is often used to refer to entire knowledge discovery process. In summary, data mining is the process of discovering interesting patterns and knowledge from large amounts of data.
Data visualization is the process of representing data graphically and interacting with these representations in order to gain insight into the data. The main task of graphs in visualization is, thus, the presentation of large amounts of data to the analyst, so that the information obtained by the graphs sharpen their reasoning and facilitate the recognition of structures, patterns, novelties, anomalies, trends or correlations for better comprehension and faster time-to-insight.
The objective of visual exploration techniques is to integrate people into the process of data exploration. One of the most important benefits of visualization is that it enables the access to huge amounts of data in ways that would not be otherwise possible. The knowledge encompassed in these various data sets would be nearly inaccessible to the casual, or even moderately interested viewer, if it was not visualized. But a good visualization gives access to that knowledge, and does so quickly, efficiently, and effectively. The data visualization tool to use depends on the nature of the data set and its underlying structure. The most commonly used data visualization tools are those that graph multidimensional data sets. Multidimensional data visualization tools enable users to visually compare among data dimensions. Most multidimensional visualizations are used to compare and contrast the values among the different data dimensions in the prepared data set. They are also used to investigate the relationships between two or more continuous or discrete variables in data sets.
Reinforcement Learning (RL) can be located between supervised learning and unsupervised learning. Problems faced by RL are those related to learning in sequential decision making in which there is limited feedback. Such decision-making problems appear in a wide variety of fields, including Automatic Control, Artificial Intelligence, Medicine, Economics, or Marketing, to name a few. However, RL has typically been employed only in a limited range of applications such as Planning, Control Engineering, and Robot Localization. In the last few years, RL has attracted the interest of the Machine Learning and Computational Intelligence communities for different kinds of applications. It is especially remarkable the relevant contribution of IDAL to applications in Medicine and Marketing, fields in which IDAL is an international reference.
Quantum Machine Learning
The current research of Machine Learning (ML) combines the study of variations to well-established methods with cutting-edge breakthroughs based on completely new approaches. Among the latter, emerging paradigms from Physics have taken special relevance in recent years. Although still in its initial stages, Quantum Machine Learning shows promising ways to speed up some of the costly ML calculations with a similar or even better performance. Two additional advantages are related to the intrinsic probabilistic approach of QML, since quantum states are genuinely probabilistic, and to the fact of finding the global minimum of a given function by means of adiabatic quantum optimization, thus circumventing the usual problem of local minima. It is also remarkable Quantum Clustering, a method that does not belong to the field of quantum computation; it uses Quantum Mechanics intuition in the service of clustering and may search for unexpected features in big data.
Natural Language Processing
NLP is a field of study that focuses on analysing, understanding and deriving meaning from human language. NLP uses machine learning algorithms to analyse text and enables applications like automatic text summarisation, sentiment analysis, topic extraction, named entity recognition, part-of-speech tagging, relationship extraction, stemming, and more. NLP is commonly used for text mining, machine translation, information retrieval and automated question answering.
Due to the fact that human language is rarely precise, NLP is a hard problem in computer science. Normally NLP relies on machine learning to automatically learn rules by analysing a set of examples (i.e. an annotated corpus) and making a statistical inference. In general, the more data analysed, the more accurate the model will be. However, the increment of computing power has lead to new unsupervised techniques for using large amounts of text. Recently, deep learning approaches have obtained very high performance across many different NLP tasks. These models can often be trained with a single end-to-end model and do not require traditional, task-specific feature engineering.