Depression detection based on speech data

Nadiia Novakova published on February 23, 2021

22 min, 4234 words

In this topic I would like to show how to manage a dataset with many features (especially numeric data with completely unclear meaning and influence on whole dataset)

The dataset contains speech features and clinical variables from participants of a depression related study. Based on speech recordings, vocal features have been derived from different categories. Each feature contains a tag _{pos,neg}, which refers to the vocal task it was extracted from. Clinical and demographic variables of participants can be found at the beginning. The study was meant to show a relation between voice patterns and depression scale (variable ADS).

Tags: data analysis data preparation matplotlib pandas numpy modelling

Basic Feature Engineering Techniques for ML

Nadiia Novakova published on January 22, 2021

17 min, 3268 words

In this topic I want to describe some basic data transformation and analysing techniques to prepare data for modelling. For demonstration I took Medical Cost Personal Datasets from Kaggle.

Categories: Data Science

Tags: python data-transformation feature-engineering

Car Market Trends - Data Exploration

Nadiia Novakova published on December 18, 2020

12 min, 2270 words

In this article I want to show some basic data preparation and analytics tasks. Also I want to shown Plotly library for data visualisation.

I have been looking for some datasets that I can easily manipulate to demonstrate the first pre-step of the Machine Learning - data preparation and cleaning. So I started to surf the Internet for datasets I can somehow explain and easily interpret based on my experience.

Tags: data analysis data preparation plotly pandas numpy

NER using OpenNLP. Model Training and Validation.

Nadiia Novakova published on November 17, 2020

3 min, 436 words

In the previous article I described how to prepare training data (corpus) for training with OpenNLP for a specific domain and custom entities.

Categories: Programming

Tags: java opennlp learn model model customization train model validate model

Tracking ML experiments with MLFlow

Nadiia Novakova published on November 12, 2020

7 min, 1229 words

MLFlow helps to data scientists to track their machine learning experiments. However, it provides more than that. Some of the main features:

Categories: Data Science

Tags: python ner mlflow nlp

NER using OpenNLP. Data Preparation.

Nadiia Novakova published on November 10, 2020

3 min, 485 words

Why NLP?

Natural language processing is a component of Text Mining that performs a special kind of linguistic analysis that essentially helps a machine to read text. So this part of Data Science provides methods and approaches to mining patterns in texts. Thus NER is just one in the list of NLP tasks.

Categories: NLP Data Science

Tags: opennlp corpus annotation model customization

Python Type Checkers

Nadiia Novakova published on November 07, 2020

7 min, 1251 words

As I am working with Java code, I also find static types in my project as a useful thing. They help me to reason about my code and avoid simple bugs, for example if I pass incorrect argument to some method or function.

Categories: Programming

Tags: python static typing

PySpark & Plotly

Nadiia Novakova published on October 31, 2020

6 min, 1155 words

Apache Spark is an abstract query engine that allows to process data at scale. Spark provides an API in several languages such as Scala, Java and Python. Today I would like to show you how to use Python and PySpark to do data analytics in Spark SQL API. I will also use Plotly library to visualise processed data.

Categories: Data Engineering

Tags: python spark sql plotly

CV

Nadiia Novakova published on February 25, 2020

5 min, 850 words