Depression detection based on speech data

Nadiia Novakova published on

22 min, 4234 words

In this topic I would like to show how to manage a dataset with many features (especially numeric data with completely unclear meaning and influence on whole dataset)

The dataset contains speech features and clinical variables from participants of a depression related study. Based on speech recordings, vocal features have been derived from different categories. Each feature contains a tag _{pos,neg}, which refers to the vocal task it was extracted from. Clinical and demographic variables of participants can be found at the beginning. The study was meant to show a relation between voice patterns and depression scale (variable ADS).

Read More

Car Market Trends - Data Exploration

Nadiia Novakova published on

12 min, 2270 words

In this article I want to show some basic data preparation and analytics tasks. Also I want to shown Plotly library for data visualisation.

I have been looking for some datasets that I can easily manipulate to demonstrate the first pre-step of the Machine Learning - data preparation and cleaning. So I started to surf the Internet for datasets I can somehow explain and easily interpret based on my experience.

Read More

Python Type Checkers

Nadiia Novakova published on

7 min, 1251 words

As I am working with Java code, I also find static types in my project as a useful thing. They help me to reason about my code and avoid simple bugs, for example if I pass incorrect argument to some method or function.

Read More

PySpark & Plotly

Nadiia Novakova published on

6 min, 1155 words

Apache Spark is an abstract query engine that allows to process data at scale. Spark provides an API in several languages such as Scala, Java and Python. Today I would like to show you how to use Python and PySpark to do data analytics in Spark SQL API. I will also use Plotly library to visualise processed data.

Read More

CV

Nadiia Novakova published on

5 min, 850 words