Natural Language Processing seeks to map language to representations that capture morphological, lexical, syntactic, semantic, or discourse characteristics that can be processed by machine learning methods.
Kamath, J. Liu, and Whitake 2019
It is necessary to present elements of theory before entering the algorithmic methodology. Many current methods are based on fundamentals that are easy to forget because their use is not direct.
This section will present classic NLP methods such as N-grams (Broder et al. 1997), tf-idf (Luhn 1957; Jones 1972), One-Hot encoding (D. Harris and S. Harris 2012), and words embeddings (Firth 1957; Gerard Salton 1962; G. Salton…
A deep dive into the quantum world and AI
I love quantum mechanics, something is fascinating about the perception of how QM explains the world. How different it is than the reality we can see and live in.
Everything we call real is made of things that cannot be regarded as real.
This quotation was a revelation to me when I was a student. Looking at the matter isn’t the good way because all of the elements are just waves. Why reality models itself when present to an observer?
I like to think the moon is there even…
It took me a long time to open this book. More for fear of finding that I knew nothing more than for fear of being frustrated with knowing everything. I regularly receive newsletters on "best of" or "most read" books about artificial intelligence, machine learning, or deep learning. Deep Learning with Python is consistently cited as one of the most recommended. The given level changes quite often between advanced, intermediate, or expert. I think that we should not stop at a level estimated by someone. You have to pick up and read a book out of interest, out of need.
This article is the first part of three articles about computer vision. Part 2 will explain Object Recognition. Part 3 will be about Image Segmentation.
With this article is provided a notebook: here on GitHub
What is more exciting than seeing the world? To be able to see the best around us? The beauty of a sunset, the memorable waterfalls, or the seas of ice? Nothing would be possible if evolution hadn’t endowed us with eyes.
We recognize things because we have learned the shape of objects, we have learned to estimate that different shape from those we have encountered…
For a long time, I heard that the problem of time series could only be approached by statistical methods (AR, AM, ARMA, ARIMA). These techniques are generally used by mathematicians who try to improve them continuously to constrain stationary and non-stationary time series.
A friend of mine (mathematician, professor of statistics, and specialist in non-stationary time series) offered me several months ago to work on the validation and improvement of techniques to reconstruct the lightcurve of stars. Indeed, the Kepler satellite, like many other satellites, could not continuously measure the intensity of the luminous flux of nearby stars. …
Like many data scientists, XGBoost is now part of my toolkit. This algorithm is among the most popular in the world of data science (real-world or competition). Its multitasking aspect allows it to be used in regression or classification projects. It can be used on tabular, structured, and unstructured data.
A notebook containing the code is available on GitHub. The notebook is intended to be used in the case of classification of documents (text).
XGBoost or eXtreme Gradient Boosting is a based-tree algorithm (Chen and Guestrin, 2016). …
This post finds his root in an interesting project of knowledge extraction. The first step was to extract the text of pdf documents. The company that I work for is based on the Google platform, so naturally, I would like to use the OCR of the API Vision but, can’t find an easy way to use the API to extract text. So here this post.
The notebook of this post is available on GitHub
Google released the API to help people, industry, and researchers to use their functionalities.
A notebook containing all the relevant code is available on GitHub.
Yes, this is a new post among many that address the subject of EDA. This step is the most important of a Data Science project. Why? Because it allows you to acquire knowledge about your data, ideas, and intuitions to be able to model the data later.
EDA is the art of making your data speak. Being able to control their quality (missing data, wrong types, wrong content …). Being able to determine the correlation between the data. Being able to know the cardinality.
EDA is not just about…
Discover Nature through our thoughts
All the points shared below are based on my experience and what I believe. It is not the absolute truth but a naive vision of what I think.
Some interesting definitions:
Research is “creative and systematic work undertaken to increase the stock of knowledge, including knowledge of humans, culture and society, and the use of this stock of knowledge to devise new applications.”It involves the collection, organization, and analysis of information to increase our understanding of a topic or issue. (source: Wikipedia)
A notebook containing all the code is available here: GitHub you’ll find code to generate different types of datasets and neural networks to test the loss functions.
To understand what is a loss function, here is a quote about the learning process:
A way to measure whether the algorithm is doing a good job — This is necessary to determine the distance between the algorithm’s current output and its expected output. The measurement is used as a feedback signal to adjust the way the algorithm works. This adjustment step is what we call learning. …
Research Scientist, AI