Commit 48e5c2d3 authored by burcharr's avatar burcharr 💬
Browse files

automatic writing commit ...

parent ad56a5bc
# Introduction
In this thesis we aim to develop several neural network based machine learning methods that can be used to detect hand washing and compulsive hand washing in inertial sensor data of wrist worn devices. We evaluate different approaches for multiple scenarios of hand washing classification. We examine the real world applicability of the developed approach with multiple users.
In this thesis we aim to develop several neural network based machine learning methods that can be used to detect hand washing and compulsive hand washing on inertial sensor data of wrist worn devices. We evaluate different approaches for multiple scenarios of hand washing classification. We examine the real world applicability of the developed approach with multiple users.
## Motivation
### Hand washing detection
Hand washing is an important part of every humans personal hygiene. We wash our hands many times every day. Washing ones hands can remove dirt or grease and importantly helps to prevent infection with pathogens @noauthor_when_2020. There are many occasions, in which it is desired that we wash our hands, among which are @noauthor_when_2020:
Hand washing is an important part of every human's personal hygiene. We wash our hands multiple times each day. Washing ones hands can remove dirt or grease and importantly helps to prevent infection with pathogens @noauthor_when_2020. There are many occasions, in which it is desired that we wash our hands, among which are @noauthor_when_2020:
- After using the toilet
- Before and after preparing or eating food
......@@ -17,28 +17,27 @@ Added to that, hand washing using soap or disinfectants is also part of the work
In order to monitor the effectiveness and frequency of hand washing, we could use a sensor based computer system to detect the activity of hand washing and its duration. Further advanced systems could also be used to predict the quality of the hand washing. These systems could then be used to reduce the risk of contaminations or infections by ameliorating the hygiene of their users.
### Obsessive-Compulsive Disorders
While it is usually really helpful and a basic part of hygiene, hand washing can also be overdone, i.e. be too frequent or be done too thoroughly. One example of persons for which overly excessive hand washing is a problem, is the small percentage of humans suffering Obsessive-Compulsive Disorders (OCD). OCD affects about $1-3\,\%$ of humans during their life @valleni-basile_frequency_1994, @fawcett_women_2020. OCD appears in the form of obsessions, that lead to compulsive behavior. There are multiple subgroups of obsessions and compulsions, including contamination concerns, symmetry and precision concerns, saving concerns and more @stein_obsessive-compulsive_2002. These concerns lead to respective compulsive behavior: Symmetry and precision concerns lead to arranging and ordering, saving concerns lead to hoarding and contamination concerns can lead to excessive washing, bathing and showering. This work's will focus on detecting hand washing and also trying to tell it apart from compulsive hand washing of OCD patients.
While it is usually really helpful and a basic part of hygiene, hand washing can also be overdone, i.e. be too frequent or be done too thoroughly. One example of persons for which overly excessive hand washing is a problem, is the small percentage of humans suffering from Obsessive-Compulsive Disorders (OCD). OCD affects about $1-3\,\%$ of humans during their life @valleni-basile_frequency_1994, @fawcett_women_2020. OCD appears in the form of obsessions, that lead to compulsive behavior. There are multiple subgroups of obsessions and compulsions, including contamination concerns, symmetry and precision concerns, saving concerns and more @stein_obsessive-compulsive_2002. These concerns lead to respective compulsive behavior: Symmetry and precision concerns lead to arranging and ordering, saving concerns lead to hoarding and contamination concerns can lead to excessive washing, bathing and showering. This work will focus on detecting hand washing and also try to tell apart hand washing from compulsive hand washing of OCD patients.
The separation of compulsive hand washing from ordinary hand washing is an even harder problem than just hand washing detection itself. It is unclear, whether it is possible to predict the type of hand washing with high probability, as there is no previous work in this area.
The separation of compulsive hand washing from ordinary hand washing is an even harder problem than just hand washing detection itself. It is unclear, whether it is possible to predict the type of hand washing with high probability, as there is no previous work in this area. It is reasonable to assume, that their are strong similarities between the kinds of hand washing, as well as subtle differences, e.g. in intensity and length.
One method of treatment for clinical cases of OCD is exposure and response prevention (ERP) therapy @meyer_modification_1966 @whittal_treatment_2005. Using this method, patients that suffer from OCD are exposed to situations in which their obsessions are stimulated and they are helped at preventing compulsive reactions to the stimulation. The patients can then "get used" to the situation in a sense, and thus the reaction will be weakened over time.
One method of treatment for clinical cases of OCD is exposure and response prevention (ERP) therapy @meyer_modification_1966 @whittal_treatment_2005. Using this method, patients that suffer from OCD are exposed to situations in which their obsessions are stimulated and they are helped at preventing compulsive reactions to the stimulation. The patients can then "get used" to the situation in a sense, and thus the reaction to the stimulation will be weakened over time. This means that their quality of life is improved, as the severity of their OCD declines.
A successful, i.e. reliable and accurate system for obsessive hand washing detection could be used to intervene, whenever the compulsive hand washing is detected. It could therefore help psychologists and their patients in the treatment of the symptoms. It could help the user to stop the compulsive behavior by issuing a warning. Such a warning could be a vibration of the device, or a sound that is played upon the detection of compulsive behavior. However, the hypothesis of usefulness is yet to be tested, as no such systems exists as of now. Therefore we want to develop a system that can not only detect hand washing, but also discriminate between usual hand washing and obsessive-compulsive hand washing.
A successful, i.e. reliable and accurate system for obsessive hand washing detection could be used to intervene, whenever the compulsive hand washing is detected. It could therefore help psychologists and their patients in the treatment of the symptoms. It could help the user to stop the compulsive behavior by issuing a warning. Such a warning could be a vibration of the device, or a sound that is played upon the detection of compulsive behavior. However, the hypothesis of usefulness is yet to be tested, as no such systems exists as of now. Therefore we want to develop a system that can not only detect hand washing with low latency and in real time, but also discriminate between usual hand washing and obsessive-compulsive hand washing at the same time. The system could then, as described, be used in ERP therapy sessions, but also in every day life, to prevent compulsive hand washing.
### Wrist worn sensors
Different types of sensors can be used to detect activities such as hand washing. It is possible to detect hand washing from RGB camera data to some extent. However, in order for this to work, we would need to place a camera at every place and room a subject could want to wash their hands at. This is unfeasible for most applications of hand washing detection, and could be very expensive. Added to that it might be problematic to place cameras inside wash or bath rooms for privacy reasons. Thus, a better alternative could be body worn, camera-less devices.
Inertial measurement units (IMUs) can measure different types of time series movement data, e.g. the acceleration or angular velocity of the device they are embedded in. IMUs are embedded in most modern smart phones and smart watches, which makes them easily available. For hand washing detection, especially the movement of the hands and wrists can contain information that can help us classify hand washing. Therefore, we can use a smart watch and its embedded IMU to try to predict whether a user is washing their hands or not. Added to that, if the user is washing their hands, we could try to predict if they are washing them in an obsessive-compulsive way or not. Another advantage of using a smart watch would be, that they usually have in-built vibration motors or even speakers. These means could be used to intervene, whenever compulsive hand washing is detected, as described above.
Inertial measurement units (IMUs) can measure different types of time series movement data, e.g. the acceleration or angular velocity of the device they are embedded in. IMUs are embedded in most modern smart phones and smart watches, which makes them easily available. For hand washing detection, especially the movement of the hands and wrists can contain information that can help us classify hand washing. Therefore, we can use a smart watch and its embedded IMU to try to predict whether a user is washing their hands or not. Added to that, if the user is washing their hands, we could try to predict if they are washing them in an obsessive-compulsive way or not. Another advantage of using a smart watch would be, that they usually have in-built vibration motors or even speakers. These means could be used to intervene, whenever compulsive hand washing is detected, as described above. Therefore, wrist worn sensors, especially those embedded into the very versatile smart watch systems, are used in this work. The wrist worn devices can also be used to execute machine learning models in real time, using publicly available libraries, e.g. on smart watches running Wear OS.
## Goals
In this work, we want to develop a method for the real time detection of hand washing and compulsive hand washing. We also want to test the method and report meaningful statistics of its success. Further, we want to test parts of the developed method in a real world scenario. We then want to draw conclusions on the applicability of the developed systems in the real world.
### Detection of hand washing in real time from inertial motion sensors
### Detection of hand washing in real time utilizing inertial measurement sensors
We want to show that neural network based classification methods can be applied to the recognition of hand washing. We want to base our method on sensor data from inertial measurement sensors in smart watches or other wrist worn IMU-equipped devices. We want to detect the hand washing in real time and directly on the mobile, i.e. on a wrist wearable device, such as a smart watch. Doing so, we would be able to give instant real time feedback to the user of the device.
### Separation of hand washing and compulsive hand washing
Added to the detection of hand washing, the detection of obsessive-compulsive hand washing is part of our goals. We want to be able to separate compulsive hand washing from non compulsive hand washing, based on the inertial motion data. Especially for the scenario of possible interventions used for the treatment of OCD, this separation is crucial, as patients do also wash their hands in non compulsive ways.
Added to the detection of hand washing, the detection of obsessive-compulsive hand washing is part of our goals. We want to be able to separate compulsive hand washing from non compulsive hand washing, based on the inertial motion data. Especially for the scenario of possible interventions used for the treatment of OCD, this separation is crucial, as OCD patients do also wash their hands in non compulsive ways and we do not want to intervene for these kinds of hand washing procedures.
### Real world evaluation
We want to evaluate the most promising of the developed models in a real world evaluation, in order to obtain a realistic estimate of its applicability in the task of hand washing detection. We want to report results of an evaluation with multiple subjects to obtain a meaningful performance estimation. From this estimation we want to draw conclusions on the applicability of the developed system in real world therapy scenarios.
\ No newline at end of file
We want to evaluate the most promising of the developed models in a real world evaluation, in order to obtain a realistic estimate of its applicability in the task of hand washing detection. We want to report results of an evaluation with multiple subjects to obtain a meaningful performance estimation. From this estimation we want to draw conclusions on the applicability of the developed system in real world therapy scenarios. Added to that, we want to derive future improvements, that could be applied to the system.
\ No newline at end of file
......@@ -35,5 +35,5 @@ declaration: Hiermit erkläre ich, dass ich diese Arbeit selbstständig verfasst
#abstract
abstract-de: Die automatische Erkennung von Händewaschen und zwanghaftem Händewaschen hat mehrere Anwendungsbereiche in Arbeits- und medizinischen Umgebungen. Die Erkennung von Händewaschen kann in zur Überprüfung der Einhaltung von Hygieneregeln eingesetzt werden, da das Händewaschen eine der wichtigsten Komponenten der persönlichen Hygiene ist. Allerdings kann das Händewaschen auch übertrieben werden, was bedeutet, dass es für die Haut und die allgemeine Gesundheit schädlich sein kann. Manche Patienten mit Zwangsstörungen waschen sich zwanghaft und zu häufig die Hände auf diese schädliche Weise. Die automatische Erkennung von zwanghaftem Händewaschen kann bei der Behandlung dieser Patienten helfen. Ziel dieser Arbeit ist es, auf neuronalen Netzen basierende Methoden zu entwickeln, die in der Lage sind, Händewaschen und zwanghaftes Händewaschen in Echtzeit auf einem am Handgelenk getragenen Gerät zu erkennen, wobei die Daten der Bewegungssensoren des am Handgelenk getragenen Geräts verwendet werden. Wir erreichen eine hohe Genauigkeit für beide Aufgaben und evaluieren Teile der Arbeit mit Probanden in einem realen Experiment, um die starke theoretische Leistung zu bestätigen.
abstract-en: The automatic detection of hand washing and compulsive hand washing has multiple areas of application in work and medical environments. Hand washing detection can be used in compliance and hygiene scenarios, as hand washing is one of the main components of personal hygiene. However, hand washing can also be overdone, which means it can be hurtful to the skin and general health. Patients with obsessive-compulsive disorder sometimes compulsively wash their hands in such a harmful way. In order to help with their treatment, the automatic detection of compulsive hand washing can possibly be applied. This thesis aims to develop neural network based methods which are able to detect hand washing as well as compulsive hand washing in real time on a wrist worn device using intertial motion sensor data of said wrist worn device. We achieve high accuracy for both tasks and evaluate parts of the work on subjects in a real world experiment, in order to confirm the strong theoretical performance achieved.
abstract-en: The automatic detection of hand washing and compulsive hand washing has multiple areas of application in work and medical environments. Hand washing detection can be used in compliance and hygiene scenarios, as hand washing is one of the main components of personal hygiene. However, hand washing can also be overdone, which means it can be unhealthy for the skin and general health. Patients with obsessive-compulsive disorder sometimes compulsively wash their hands in such a harmful way. In order to help with their treatment, the automatic detection of compulsive hand washing can possibly be applied. This thesis aims to develop neural network based methods which are able to detect hand washing as well as compulsive hand washing in real time on a wrist worn device using intertial motion sensor data of said wrist worn device. We achieve high accuracy for both tasks and evaluate parts of the work on subjects in a real world experiment, in order to confirm the strong theoretical performance achieved.
---
......@@ -302,7 +302,7 @@ The sensitivity is the rate of positive samples that get correctly recognized, t
For the multiclass problem of distinguishing obsessive hand washing from normal hand washing from other activities, the binary metrics are not applicable. Here, we report normalized confusion matrices, and their mean diagonal values as one performance measure. The confusion matrix shows, which amount of samples belonging to a certain class (true label, rows of the matrix) are predicted to belong to which other class (predicted label, columns of the matrix). The normalized version of the confusion matrix replaces the total values by ratios in proportion to the amount of true labels for each class. This means that for each true label row in the matrix, the values sum to 1.
The mean diagonal value of this matrix can be seen as a mean class accuracy score, as the diagonal values of the normalized confusion matrix are the accuracy values for each possible class.
Added to that, we report an adapted F1 score. The adapted multiclass F1 score is calculated by taking the mean over all classes $\mathbf{C}$, of the F1 scores if we treat the class $\mathbf{C}_i, i \in [0,1,2]$ as the positive class, and the remaining classes as the negative class:
Added to that, we report an adapted F1 score, identical to the one used by Zeng et al. @zeng_understanding_2018. The adapted multiclass F1 score is calculated by taking the mean over all classes $\mathbf{C}$, of the F1 scores if we treat the class $\mathbf{C}_i, i \in [0,1,2]$ as the positive class, and the remaining classes as the negative class:
\begin{align}
F_1\ score\ multi = \frac{1}{3}\cdot \sum_{i=0}^2 F_1\ score(\mathbf{C}_i)
\end{align}
......
# Related Work
Automatically detecting the current activity of a human being is a wide research field in computer science. There are many possible applications, e.g. human robot interaction, quality assessments, worker surveillance, control of user interfaces and more.
Automatically detecting the current activity of a human being is a wide research field in computer science. There are many possible applications of gesture and activity recognition, e.g. human robot interaction, quality assessments, worker surveillance, control of user interfaces and more. Hand washing detection, which is a special case of activity detection, has also been the research interest of multiple studies over the past years. The most interesting fields of research for our work, apart from hand washing detection itself, are the fields of gesture recognition and human activity recognition, whose relevance will be explained below.
## Gesture Recognition
In the area of gesture recognition, we try to detect and classify specific, and closely defined gestures.
## Gesture recognition
In the area of gesture recognition, we try to detect and classify specific, and narrowly defined gestures.
The defined gestures can e.g. be used to actively control a system @saini_human_2020. This kind of approach is not directly applicable to our task of detecting hand washing. However, it could be possible to adapt algorithms from this field to the detection of a new gesture or a new set of gestures related to hand washing.
There are camera-based approaches and physical measurement based approaches @saini_human_2020. The camera base approaches were out of scope for this work. As explained in the introduction, wrist worn devices have significant advantages over camera-based solutions that would have to be stationary, i.e. in fixed locations.
There also exist approaches based on inertial measurement sensors. These sensors measure movement related physical values, such as the force or acceleration, angular velocity, orientation in space.
There are camera-based approaches and physical measurement based approaches @saini_human_2020. The camera based approaches were out of scope for this work. As explained in the introduction, in our setting, wrist worn devices have significant advantages over camera-based solutions that would have to be stationary, i.e. in fixed locations.
There also exist approaches based on inertial measurement sensors. These sensors measure movement related physical values, such as the force or acceleration, angular velocity or orientation in space.
Gesture recognition, in general, uses similar methods as the more difficult human activity recognition @saini_human_2020.
Gesture recognition, in general, uses similar methods as the more difficult human activity recognition @saini_human_2020, which will be explained below.
## Human Activity Recognition
## Human activity recognition
\label{section:har}
Recognizing more than one gesture or body movement in combination in a temporal context and deriving the current activity of the user is called human activity recognition (HAR). In this task, we want to detect a more general activity, compared to a shorter and simpler gesture. An activity can include many distinguishable gestures. However, the same activity will not always include all of the same gestures and the gestures included could be in a different order for every repetition. Activities are less repetitive than gestures, and harder to detect in general @zhu_wearable_2011. However, Zhu et al. have shown that the combined detection of multiple different gestures can be used in HAR tasks too @zhu_wearable_2011, which makes sense, because a human activity can consist of many gestures. Nevertheless, most methods used for HAR consist of more direct applications of machine learning to the data, without the detour of detecting specific gestures contained in the execution of an activity.
Recognizing more than one gesture or body movement in combination in a temporal context and deriving the current activity of the user is called human activity recognition (HAR). In this task, we want to detect more general activities, compared to a shorter and simpler gestures. An activity can include many distinguishable gestures. However, the same activity will not always include all of the same gestures and the gestures included could be in a different order for every repetition. Activities are less repetitive than gestures, and harder to detect in general @zhu_wearable_2011. However, Zhu et al. have shown that the combined detection of multiple different gestures can be used in HAR tasks too @zhu_wearable_2011, which makes sense, because a human activity can consist of many gestures. Nevertheless, most methods used for HAR consist of more direct applications of machine learning to the data, without the detour of detecting specific gestures contained in the execution of an activity.
Methods used in HAR include classical machine learning methods as well as deep learning @liu_overview_2021 @bulling_tutorial_2014. The classical machine learning methods rely on features of the data obtained by feature engineering. These methods include but are not limited to Random Forests, Hidden Markov Models (HMM), Support Vector Machines (SVM), the $k$-nearest neighbors algorithm and more. The features can frequency-domain based and time-domain based, but usually both are used at the same time to train these conventional models @liu_overview_2021.
Methods used in HAR include classical machine learning methods as well as deep learning @liu_overview_2021 @bulling_tutorial_2014. The classical machine learning methods rely on features of the data obtained by feature engineering. The required feature engineering is the creation of meaningful statistics or calculations based on the time frame for which the activity should be predicted. The features can be frequency-domain based and time-domain based, but usually both are used at the same time to train these conventional models @liu_overview_2021. The classical machine learning methods include but are not limited to Random Forests (RFC), Hidden Markov Models (HMM), Support Vector Machines (SVM), the $k$-nearest neighbors algorithm and more.
#### Deep neural networks
Recently, deep neural networks have taken over the role of the state of the art machine learning method in the area of human activity recognition @bock_improving_2021, @liu_overview_2021. Deep neural networks are universal function approximators @bishop_pattern_2006, and are known for being easy to use on "raw" data. They are "artificial neural networks" consisting of multiple layers, where each layer contains a set amount of nodes that are connected to the nodes of the following layer. Simple neural networks where all nodes of a layer are connected to all nodes in the following layer are often called "fully connected neural networks (FC-NN or FC)".
The connections' parameters are optimized using forward passes followed by execution of the backpropagation algorithm, and an optimization step. We can accumulate all the gradients with regard to a loss function for each of the parameters and for a small subset of the data and perform "stochastic gradient decent" (SGD). SGD or alternative similar optimization methods like the commonly used ADAM @kingma_adam_2017 optimizer perform a parameter update step. After many such updates and if the training works well, the network parameters will have been updated to values that lead to a lower value of the loss function for the training data. However, there is no guarantee of conversion whatsoever. As mentioned above, deep neural networks can, in theory, be used to approximate arbitrary functions. However, empirical testing has revealed that neural networks do need a lot of training data in order to perform well, compared to classical machine learning methods.
Recently, deep neural networks have taken over the role of the state of the art machine learning method in the area of human activity recognition @bock_improving_2021, @liu_overview_2021. Deep neural networks are universal function approximators @bishop_pattern_2006, and are known for being easy to use on "raw" data. They are "artificial neural networks" consisting of multiple layers, where each layer contains a certain amount of nodes that are connected to the nodes of the following layer. The connections are each assigned a weight, and the weighted sum over the values of all the previous connected nodes is used to calculate the value of a node in the next layer. Simple neural networks where all nodes of a layer are connected to all nodes in the following layer are often called "fully connected neural networks" (FC-NN or FC).
The connections' parameters are optimized using forward passes through the network of nodes, followed by the execution of the backpropagation algorithm, and an optimization step. We can accumulate all the gradients with regard to a loss function for each of the parameters and for a small subset of the data passed and perform "stochastic gradient decent" (SGD). SGD or alternative similar optimization methods like the commonly used ADAM @kingma_adam_2017 optimizer perform a parameter update step. After many such updates and if the training works well, the network parameters will have been updated to values that lead to a lower value of the loss function for the training data. However, there is no guarantee of convergence whatsoever. As mentioned above, deep neural networks can, in theory, be used to approximate arbitrary functions. Nevertheless, the parameters for the perfect approximation cannot be easily found, and empirical testing has revealed that neural networks do need a lot of training data in order to perform well, compared to classical machine learning methods. In return, with enough data, deep neural networks often outperform the classical machine learning methods.
###### Convolutional neural networks (CNNs)
Recurrent neural networks (RNNs) are similar to feed forward neural networks, with the difference being that they have access to information from a previous time step. The simplest version of an RNN is a single node that takes the input $\mathbf{x}_t$ and its own output $\mathbf{h}_{t-1}$ from the last time step as inputs. RNNs can be trained on time series data and are able to interprete temporal connections and dependencies in the data to some extent. Recurrent neural networks are trained using "back propagation through time" @mozer_focused_1995. This means that we have to run a forwards pass of multiple time steps through the network first, followed by a back propagation that sums up over all the different time steps and their gradients. For "long" runs, i.e. if the network is supposed to take into account many time steps, there is the "vanishing gradient problem" @hochreiter_vanishing_1998. With an increasing amount of time steps, the gradients become smaller and smaller, making it harder or impossible to properly train the recurrent neural network.
TODO still missing.
Long short-term memory (LSTM) can be used to combat the vanishing gradient problem in recurrent neural networks @hochreiter_long_1997, @hochreiter_vanishing_1998. It can be used in various applications, such as time series prediction, speech recognition and translation tasks (including generative tasks) @smagulova_survey_2019, but also for human activity recognition.
###### Recurrent neural networks (RNNs)
are similar to feed forward neural networks, with the difference being that they have access to information from a previous time step. The simplest version of an RNN is a single node that takes the input $\mathbf{x}_t$ and its own output $\mathbf{h}_{t-1}$ from the last time step as inputs. RNNs can be trained on time series data and are able to interprete temporal connections and dependencies in the data to some extent. Recurrent neural networks are trained using "back propagation through time" @mozer_focused_1995. This means that we have to run a forwards pass of multiple time steps through the network first, followed by a back propagation that sums up over all the different time steps and their gradients. For "long" runs, i.e. if the network is supposed to take into account many time steps, there is the "vanishing gradient problem" @hochreiter_vanishing_1998. With an increasing amount of time steps, the gradients become smaller and smaller, making it harder or impossible to properly train the recurrent neural network.
###### Long short-term memory (LSTM)
can be used to combat the vanishing gradient problem in recurrent neural networks @hochreiter_long_1997, @hochreiter_vanishing_1998. It can be used in various applications, such as time series prediction, speech recognition and translation tasks (including generative tasks) @smagulova_survey_2019, but also for human activity recognition. It can handle temporal connections well and "remember" important parts of its past state.
\label{sec:LSTM}
LSTMs consist of a "cell" of which one or more can be contained in a neural network.
......@@ -55,32 +62,37 @@ The four LSTMs' gates are:
- input gate
- output gate
These gates are fully connected neural network layers (marked in orange and with the corresponding activation functions in @fig:lstm_cell) with respective weights and biases and serve a functionality from which their names are derived. The weights and biases must be learned during the training phase of the neural network. The forget gate allows the LSTM to only apply part of the "remembered" cell memory $\mathbf{c}_{t-1}$ in the current step, i.e. which bits should be used to which extent with regard to the current new input data $\mathbf{x}_t$ and the hidden state from the last time step $\mathbf{h}_{t-1}$. The output of the forget gate, $\mathbf{f}_t$, multiplied bit-wise with $\mathbf{c}_{t-1}$ is considered the "remembered" information from the last step. The new memory gate and the input gate are used to decide which new data is added to the cell state. These two layers are also given the previous step's hidden state $\mathbf{h}_{t-1}$ and the current step's input $\mathbf{x}_t$. In combination, the new memory network output $\tilde{\mathbf{c}}_t$ and the input gates' output $\mathbf{i}_t$ decide which components of the current input and hidden state will be taken into the new memory state $\mathbf{c}_{t}$. The memory state is passed on to the next step. The output gate will generate $\mathbf{o}_t$, which will be combined with $tanh(\mathbf{c}_{t})$ by element-wise matrix multiplication to form the new hidden state $\mathbf{h}_{t}$.
These gates are fully connected neural network layers (marked in orange and with the corresponding activation functions in @fig:lstm_cell) with respective weights and biases and serve a functionality from which their names are derived. The weights and biases must be learned during the training phase of the neural network. The forget gate allows the LSTM to only apply part of the "remembered" cell memory $\mathbf{c}_{t-1}$ in the current step, i.e. which bits should be used to which extent with regard to the current new input data $\mathbf{x}_t$ and the hidden state from the last time step $\mathbf{h}_{t-1}$. The output of the forget gate, $\mathbf{f}_t$, multiplied element-wise with $\mathbf{c}_{t-1}$ is considered the "remembered" information from the last step. The new memory gate and the input gate are used to decide which new data is added to the cell state. These two layers are also given the previous step's hidden state $\mathbf{h}_{t-1}$ and the current step's input $\mathbf{x}_t$. In combination, the new memory network output $\tilde{\mathbf{c}}_t$ and the input gates' output $\mathbf{i}_t$ decide which components of the current input and hidden state will be taken into the new memory state $\mathbf{c}_{t}$. The memory state is passed on to the next step. The output gate will generate $\mathbf{o}_t$, which will be combined with $tanh(\mathbf{c}_{t})$ by element-wise matrix multiplication to form the new hidden state $\mathbf{h}_{t}$.
DeepConvLSTM is @ordonez_deep_2016 a network proposed by Ordonez et al. and consists of a number of convolutional layers as well two LSTM layers. It reaches state of the art performance and is used for general human activity recognition tasks.
###### DeepConvLSTM
is a network proposed by Ordonez et al. @ordonez_deep_2016 and consists of a number of convolutional layers as well as two LSTM layers. It reaches state of the art performance and is used for general human activity recognition tasks. The combination of convolutional layers and LSTMs works well with time series data, as it can use the advantages of both convolutional layers and the intelligent "memory" provided by the LSTMs.
Bock et al. @bock_improving_2021 employ an altered version of DeepConvLSTM @ordonez_deep_2016, which is a network consisting of a number of convolutional layers as well as one or two LSTM layers. Bock et al. propose reducing the amount of LSTM layers to one, resulting in the architecture shown in @fig:deepConvLSTM. They evaluate their approach on 5 different publicly available data sets and report an increased performance on four out of the five. Leaving out one LSTM layer drastically reduces the amount of parameters to be learned as well as the time needed to train the network.
Bock et al. @bock_improving_2021 employ an altered version of DeepConvLSTM @ordonez_deep_2016. Bock et al. propose reducing the amount of LSTM layers to one, resulting in the architecture shown in @fig:deepConvLSTM. They evaluate their approach on 5 different publicly available data sets and report an increased performance on four out of the five. Leaving out one LSTM layer drastically reduces the amount of parameters to be learned as well as the time needed to train the network.
![DeepConvLSTM and the altered version, by Marius Bock @bock_improving_2021](img/deepConvBock.png){#fig:deepConvLSTM width=98%}
![Information propagation of LSTM and LSTM with temporal attention mechanism (adjusted from @zeng_understanding_2018)](img/lstm_lstm_temporal_attention.png){#fig:lstm_attention width=98%}
\label{sec:LSTMA}
In their paper "Understanding and improving recurrent networks for human activity recognition by continuous attention" , Zeng et al. apply an attention mechanism to LSTM based neural network models @zeng_understanding_2018. They propose the separate addition of temporal and sensor attention to the LSTM layers used in such networks. The sensor attention approach can be useful when using multiple sensor locations across the body and can be used to let the network focus on measurements from the more relevant sensors for specific tasks. The temporal attention approach works as shown in @fig:lstm_attention. The "normal" unrolled LSTM forward pass is pictured on the left. The temporal attention mechanism, makes the information from the past recurrency steps available after the LSTM output and can be seen on the right. The past outputs of the LSTM are saved in each step and then added together at $\mathbf{H}$ as a weighted sum. The parameters $\alpha_t$ for the weighted sum are also predicted by the network. The parameters for the "score" layer $\mathbf{W}_{\alpha}$ are learned as part of the neural networks training routine. The resulting formulas are shown in equations \ref{eqn:attent_lstm1} to \ref{eqn:attent_lstm2}.
In their paper "Understanding and improving recurrent networks for human activity recognition by continuous attention" , Zeng et al. apply an attention mechanism to LSTM based neural network models @zeng_understanding_2018. They propose the separate addition of temporal and sensor attention to the LSTM layers used in such networks. The sensor attention approach can be useful when using multiple sensor locations across the body and can be used to let the network focus on measurements from the more relevant sensors for specific tasks. The temporal attention approach works as shown in @fig:lstm_attention. The "normal" unrolled LSTM forward pass is pictured on the left. The temporal attention mechanism, makes the information from the past recurrency steps available after the LSTM output and can be seen on the right. The past outputs of the LSTM are saved in each step and then added together at $\mathbf{H}$ as a weighted sum with weight parameters $\alpha_t$, $t \in [1, ..., T]$. The parameters $\alpha_t$ for the weighted sum are also predicted by the network. The parameters for the "score" layer $\mathbf{W}_{\alpha}$ are learned as part of the neural networks training routine. The resulting formulas are shown in equations \ref{eqn:attent_lstm1} to \ref{eqn:attent_lstm2}.
\begin{figure}
\begin{align}
\label{eqn:attent_lstm1}
\mathbf{H} &= \sum_{t=1}^T \alpha_t\mathbf{h}_t \\
\alpha_t &= \frac{exp\{score(\mathbf{h}_T,\mathbf{h}_t)\}}{\sum_{s=1}^{T}exp\{score(\mathbf{h}_T,\mathbf{h}_s)\}} & \\
\alpha_t &= \frac{exp\{score(\mathbf{h}_T,\mathbf{h}_t)\}}{\sum_{s=1}^{T}exp\{score(\mathbf{h}_T,\mathbf{h}_s)\}} \label{eqn:attent_lstm_sm} \\
score(\mathbf{h}_T,\mathbf{h}_s) &= \mathbf{h}_t^T\mathbf{W}_{\alpha}\mathbf{h}_s
\label{eqn:attent_lstm2}
\end{align}
\end{figure}
They evaluate their approach on 3 data sets and report a state of the art performance, beating the initial DeepConvLSTM.
Note that the calculation of $\alpha_t$ is done with the softmax function as shown in eqn. \ref{eqn:attent_lstm_sm}, although this is not explicitly mentioned by the authors of the paper. This makes sure that the weights $\alpha$ used for the weighted sum, always sum up to 1.
Another study by Singh et al. combines DeepConvLSTM with a self-attention mechanism @singh_deep_2021. The attention mechanism is very similar to the one used by Zeng et al. @zeng_understanding_2018, where the mechanism consists of a layer that follows the LSTM layers in the DeepConvLSTM network. Instead of with a weighted sum, Singh et al. find the weights $\mathbf{\alpha}$ by applying the softmax function to the output of a fully connected layer. They also report a statistically significant increase in performance compared to the initial DeepConvLSTM.
Zeng et al. evaluate their approach on 3 data sets and report a state of the art performance, beating the initial DeepConvLSTM.
Another study by Singh et al. combines DeepConvLSTM with a self-attention mechanism @singh_deep_2021. The attention mechanism is very similar to the one used by Zeng et al. @zeng_understanding_2018, where the mechanism consists of a layer that follows the LSTM layers in the DeepConvLSTM network. Instead of utilizing a score layer which uses both $h_t$ and $h_T$, Singh et al. find the weights $\mathbf{\alpha}$ by applying the softmax function to the output of a fully connected layer, for each $h_t$, without taking into account $h_T$. Other than that, the two attention mechanisms are pretty similar. Singh et al. also report a statistically significant increase in performance compared to the initial DeepConvLSTM, although the evaluate their approach on different data sets than Zeng et al..
For HAR, DeepConvLSTM and the models derived from it are the state of the art machine learning methods, as their consistently outperform other model architectures on the available benchmarks and data sets.
## Hand washing
To our knowledge, no study has ever tried to separately predict obsessive hand washing opposed to non-obsessive hand washing.
......@@ -95,7 +107,7 @@ Mondol et al. employ a simple feed forward neural network consisting of a few li
![Steps of HAWAD for parameter estimation and inference, taken from @sayeed_mondol_hawad_2020](img/HAWAD_filter.png){width=98% #fig:HAWAD}
They use the said features of all positive class samples to calculate the mean $\boldsymbol{\mu}$ and covariance matrix $\mathbf{S}$ of the feature distribution. Based on these measures, one can compute each sample's distance to the distribution using the Maharanis distance (as seen in equation \ref{eqn:mahala}). If during test time, the model predicts a sample to belong to the positive class, the distance is calculated. If the distance is bigger than a threshold ($d_{th}$), the sample is classified as a negative. The threshold $d_{th}$ can be derived by selecting it fittingly in order to include almost all positive samples seen during training. The parameter estimation and hand washing steps performed in the HAWAD paper can be seen in @fig:HAWAD. On their own data set (HAWAD data set) they reach F1-Scores of over 90% for hand washing detection.
They use the said features of all positive class samples to calculate the mean $\boldsymbol{\mu}$ and covariance matrix $\mathbf{S}$ of the feature distribution. Based on these measures, one can compute each sample's distance to the distribution using the Mahalanobis distance (as seen in equation \ref{eqn:mahala}). If during test time, the model predicts a sample to belong to the positive class, the distance is calculated. If the distance is bigger than a threshold ($d_{th}$), the sample is classified as a negative. The threshold $d_{th}$ can be derived by selecting it fittingly in order to include almost all positive samples seen during training. The parameter estimation and hand washing steps performed in the HAWAD paper can be seen in @fig:HAWAD. On their own data set (HAWAD data set) they reach F1-Scores of over 90% for hand washing detection.
\begin{figure}
\begin{align}
......@@ -105,10 +117,4 @@ D_M(\mathbf{x}) = \sqrt{(\mathbf{x}- \boldsymbol{\mu})^T\mathbf{S}^{-1}(\mathbf{
\caption*{Equation \ref*{eqn:mahala}: Mahalanobis distance}
\end{figure}
TODO: graphic representations where missing
\ No newline at end of file
To our knowledge, no hand washing detection method using more complicated neural networks has been published as of 2021. The performance reached for the HAWAD paper could possibly be surpassed by convolutional or recurrent networks or a combination thereof, i.e. DeepConvLSTM. Added to that, the detection and separation of compulsive hand washing from ordinary hand washing has, to our knowledge, never been done before, it seems likely, that methods from hand washing detection and human activity recognition can be applied to this problem as well.
......@@ -119,7 +119,7 @@ The activities leading to false positives include:
- Brushing teeth
- Cleaning
The full list of reported activities can be found in the appendix.
The full list of reported activities for which false positives occurred can be found in the appendix.
Some subjects also reported difficulties with the smart watch application (not part of this work), which lead to the model not being run at all sometimes, which might also have influenced the results. It could be possible, that for some hand washing procedures, the smart watch application was not executed, which would lead the user to note down a false negative, also decreasing the sensitivity in the results.
......
No preview for this file type
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment