Source: Image taken by the author

A theoretical way through n-grams, tf-idf, one-hot encoding, word embeddings. And a surprise with pre-trained models.

Introduction

It is necessary to present elements of theory before entering the algorithmic methodology. Many current methods are based on fundamentals that are easy to forget because their use is not direct.

This section will present classic NLP methods such as N-grams (Broder et al. 1997), tf-idf (Luhn 1957; Jones 1972), One-Hot encoding (D. Harris and S. Harris 2012), and words embeddings (Firth 1957; Gerard Salton 1962; G. Salton…


Photo by NASA on Unsplash

A deep dive into the quantum world and AI

Introduction

I love quantum mechanics, something is fascinating about the perception of how QM explains the world. How different it is than the reality we can see and live in.

This quotation was a revelation to me when I was a student. Looking at the matter isn’t the good way because all of the elements are just waves. Why reality models itself when present to an observer?


Take by the author — Deep Learning with Python, 2017, François Chollet book and my cat

A great tour in the world of Deep Learning in less than 10 minutes.

It took me a long time to open this book. More for fear of finding that I knew nothing more than for fear of being frustrated with knowing everything. I regularly receive newsletters on "best of" or "most read" books about artificial intelligence, machine learning, or deep learning. Deep Learning with Python is consistently cited as one of the most recommended. The given level changes quite often between advanced, intermediate, or expert. I think that we should not stop at a level estimated by someone. You have to pick up and read a book out of interest, out of need.


Source: Images taken by the author and annotated by him

The difference between the techniques and their applications

This article is the first part of three articles about computer vision. Part 2 will explain Object Recognition. Part 3 will be about Image Segmentation.

With this article is provided a notebook: here on GitHub

Introduction

What is more exciting than seeing the world? To be able to see the best around us? The beauty of a sunset, the memorable waterfalls, or the seas of ice? Nothing would be possible if evolution hadn’t endowed us with eyes.

We recognize things because we have learned the shape of objects, we have learned to estimate that different shape from those we have encountered…


Source: Result of the study — computed by the author

An application of the RNN family

Introduction

For a long time, I heard that the problem of time series could only be approached by statistical methods (AR[1], AM[2], ARMA[3], ARIMA[4]). These techniques are generally used by mathematicians who try to improve them continuously to constrain stationary and non-stationary time series.

A friend of mine (mathematician, professor of statistics, and specialist in non-stationary time series) offered me several months ago to work on the validation and improvement of techniques to reconstruct the lightcurve of stars. Indeed, the Kepler satellite[11], like many other satellites, could not continuously measure the intensity of the luminous flux of nearby stars. …


Source: Image by Free-Photos from Pixabay

In the world of machine learning and Kaggle competitions, the XGBoost algorithm has the first place.

Introduction

Like many data scientists, XGBoost is now part of my toolkit. This algorithm is among the most popular in the world of data science (real-world or competition). Its multitasking aspect allows it to be used in regression or classification projects. It can be used on tabular, structured, and unstructured data.

A notebook containing the code is available on GitHub. The notebook is intended to be used in the case of classification of documents (text).

XGBoost

XGBoost or eXtreme Gradient Boosting is a based-tree algorithm (Chen and Guestrin, 2016[2]). …


Source: Manuel Geissinger, Pexel

Extract text from image with OCR using a service account.

Introduction

This post finds his root in an interesting project of knowledge extraction. The first step was to extract the text of pdf documents. The company that I work for is based on the Google platform, so naturally, I would like to use the OCR of the API Vision but, can’t find an easy way to use the API to extract text. So here this post.

The notebook of this post is available on GitHub

Google API Vision

Google released the API to help people, industry, and researchers to use their functionalities.

Google Cloud's Vision API has powerful machine learning models pre-trained through REST…


Photo by Markus Spiske on Unsplash

Make your data beautiful and understandable with EDA libraries, features importance, feature selection, and feature extraction

A notebook containing all the relevant code is available on GitHub.

I — Exploratory Data Analysis or commonly EDA

Yes, this is a new post among many that address the subject of EDA. This step is the most important of a Data Science project. Why? Because it allows you to acquire knowledge about your data, ideas, and intuitions to be able to model the data later.

EDA is the art of making your data speak. Being able to control their quality (missing data, wrong types, wrong content …). Being able to determine the correlation between the data. Being able to know the cardinality.

EDA is not just about…


Take by the author

Discover Nature through our thoughts

All the points shared below are based on my experience and what I believe. It is not the absolute truth but a naive vision of what I think.

What is Research?

Some interesting definitions:

Research is “creative and systematic work undertaken to increase the stock of knowledge, including knowledge of humans, culture and society, and the use of this stock of knowledge to devise new applications.”It involves the collection, organization, and analysis of information to increase our understanding of a topic or issue. (source: Wikipedia)

Scientific research is a systematic way of gathering data and harnessing curiosity. This…


Source: Miriam Espacio — Pexels

After the post on activation functions, we will dive into the second part, the loss, or objective function for neural networks

A notebook containing all the code is available here: GitHub you’ll find code to generate different types of datasets and neural networks to test the loss functions.

To understand what is a loss function, here is a quote about the learning process:

A way to measure whether the algorithm is doing a good job — This is necessary to determine the distance between the algorithm’s current output and its expected output. The measurement is used as a feedback signal to adjust the way the algorithm works. This adjustment step is what we call learning. …

Christophe Pere

Research Scientist, AI

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store