Commit b0236cce authored by burcharr's avatar burcharr 💬
Browse files

automatic writing commit ...

parent fb60060c
...@@ -35,14 +35,19 @@ In order to correctly detect the hand washing in real time in a real world scena ...@@ -35,14 +35,19 @@ In order to correctly detect the hand washing in real time in a real world scena
In order to also separate non-obsessive hand washing from obsessive hand washing, data of obsessive hand washing must be included. In order to record this data, real patients can be asked to wear a sensor during their daily life. In order to also separate non-obsessive hand washing from obsessive hand washing, data of obsessive hand washing must be included. In order to record this data, real patients can be asked to wear a sensor during their daily life.
### Data used in our data set ### Data used in our data set
We used hand washing data and "compulsive" hand washing data recorded at the University of Basel and University of Freiburg as our "positive" class data. This data was recorded at several occasions and using different paradigms.The recording of the data is not part of this work. We mainly used data recorded at $50\,$Hz, using a smart watch application. In 2019 and in 2020, data was recorded. The data from 2019 includes hand washing data and, added to that, also includes simulated "compulsive" hand washing. For the simulated compulsive hand washing, subjects were asked to "dirty" their hands with different substances, like finger paint or Nivea \textcopyright creme, in order to serve as a motivation for intensive hand washing. Afterwards, they had to follow certain scripts of intensive hand washing steps. Each script contained several steps of washing, like interlacing the fingers, washing the fingers individually, washing the palms and more. We used hand washing data and "compulsive" hand washing data recorded at the University of Basel and University of Freiburg as our "positive" class data. This data was recorded at several occasions and using different paradigms. We mainly used data recorded at $50\,$Hz, using a smart watch application. In 2019 and in 2020, data was recorded. The data from 2019 includes hand washing data and, added to that, also includes simulated "compulsive" hand washing. For the simulated compulsive hand washing, subjects were asked to "dirty" their hands with different substances, like finger paint or Nivea \textregistered\ creme, in order to serve as a motivation for intensive hand washing. Afterwards, they had to follow certain scripts of intensive hand washing steps. Each script contained several steps of washing, like interlacing the fingers, washing the fingers individually, washing the palms and more.
We used the simulated compulsive hand washing data as compulsive hand washing data, as we did not have access to recordings of actual obsessive hand washing. Thus, when we write about "compulsive" hand washing data we used in this thesis, the simulated hand washing is meant.
A part of the used gestures is shown in @fig:gestures.
![Examples of gestures used for the simulation of compulsive hand washing, by Phillip Scholl](img/gestures.jpg){width=98% #fig:gestures}
For this work, we used the simulated compulsive hand washing data as compulsive hand washing data, as we did not have access to recordings of actual compulsive hand washing. Thus, when we write about the "compulsive" hand washing data we used in this thesis, the simulated compulsive hand washing is meant.
The recording and labeling of the data is not part of this work. The recording and labeling of the data is not part of this work.
Added to that, multiple data sets from other studies were used. In our selection, we include publicly available data sets of which each contains wrist worn sensor data of at least one arm. Not all the data sets were recorded at the frequency of 50Hz. Thus, we resampled all data obtained to our fixed frequency using linear interpolation. Added to that, multiple data sets from other studies were used. In our selection, we include publicly available data sets of which each contains wrist worn sensor data of at least one arm. Not all the data sets were recorded at the frequency of 50Hz. Thus, we resampled all data obtained to our fixed frequency of $50\,$Hz using linear interpolation.
\begin{table}[] \begin{table}[h]
\begin{tabular}{|l|l|l|l|} \begin{tabular}{|l|l|l|l|}
\hline \hline
No & Dataset name & Contained activities (excerpt) & Recording frequency \\ \hline No & Dataset name & Contained activities (excerpt) & Recording frequency \\ \hline
...@@ -58,6 +63,8 @@ No & Dataset name & Contained activities (excerpt) ...@@ -58,6 +63,8 @@ No & Dataset name & Contained activities (excerpt)
\label{tbl:datasets} \label{tbl:datasets}
\end{table} \end{table}
\filbreak
The external data sets used are: The external data sets used are:
- WISDM @kwapisz_activity_2011 - WISDM @kwapisz_activity_2011
...@@ -67,6 +74,7 @@ The external data sets used are: ...@@ -67,6 +74,7 @@ The external data sets used are:
The external data sets were collected and converted by Daniel Homm, analyzed and resampled by us. Their contents can be seen in table \ref{tbl:datasets}. The external data sets were collected and converted by Daniel Homm, analyzed and resampled by us. Their contents can be seen in table \ref{tbl:datasets}.
### Specifications of the resulting data set used ### Specifications of the resulting data set used
The final data set used contains a total of 14.4 million 6-dimensional data points. With these 14.4 million data points we created windows of length 150 samples (3 seconds), with 50% overlap. This left us with ~194,000 windows. Out of those windows, ~15,750 ($8,2\,\%$) contained hand washing, ~178,500 ($91,8\,\%$) were other activities or idle. Out of the ~15,750 hand washing windows, ~10,250 ($65\,\%$) were compulsive hand washing windows, ~5500 ($35\,\%$) were non compulsive washing. We note that for most machine learning methods it makes sense to balance the training set with regard to the classes, in order to avoid biases towards the more frequent classes. In specific machine learning algorithms, one could also combat the class imbalance problem using an importance weighting for the different classes. To train a neural network, the loss function can also be weighted by the class frequency. The final data set used contains a total of 14.4 million 6-dimensional data points. With these 14.4 million data points we created windows of length 150 samples (3 seconds), with 50% overlap. This left us with ~194,000 windows. Out of those windows, ~15,750 ($8,2\,\%$) contained hand washing, ~178,500 ($91,8\,\%$) were other activities or idle. Out of the ~15,750 hand washing windows, ~10,250 ($65\,\%$) were compulsive hand washing windows, ~5500 ($35\,\%$) were non compulsive washing. We note that for most machine learning methods it makes sense to balance the training set with regard to the classes, in order to avoid biases towards the more frequent classes. In specific machine learning algorithms, one could also combat the class imbalance problem using an importance weighting for the different classes. To train a neural network, the loss function can also be weighted by the class frequency.
...@@ -112,7 +120,7 @@ The data used for training and testing the models for the different problems dif ...@@ -112,7 +120,7 @@ The data used for training and testing the models for the different problems dif
## Baselines ## Baselines
Baselines can be used to show that our approach outperforms classic and simple approaches to solving the problem. Baselines can be used to show that our approach outperforms classic and simple approaches to solving the problem.
As baselines, we use a support vector machine (SVM), and a random forest classifier (RFC), as well as a simple feed-forward fully connected neural network. As baselines, we use a support vector machine (SVM) and a random forest classifier (RFC).
For each of the problems, we train the baselines with the same windows. For SVM and RFC, feature preprocessing is done and we use the following features per window and sensor axis: For each of the problems, we train the baselines with the same windows. For SVM and RFC, feature preprocessing is done and we use the following features per window and sensor axis:
- mean - mean
......
...@@ -91,7 +91,7 @@ Hand washing compliance can be measured using different tools. Jain et al. @jain ...@@ -91,7 +91,7 @@ Hand washing compliance can be measured using different tools. Jain et al. @jain
A study by Li et al. @li_wristwash_2018 is able to recognize 13 steps of a hand washing procedure with an accuracy of $85\,\%$. They employ a sliding window feature based hidden markov model approach. Wang et al. explore using sensor armbands to assess the users compliance with given hand washing hygiene guidelines @wang_accurate_2020. They run a classifier using XGBoost and are mostly able to separate the different steps of the scripted hand washing routine. A study by Li et al. @li_wristwash_2018 is able to recognize 13 steps of a hand washing procedure with an accuracy of $85\,\%$. They employ a sliding window feature based hidden markov model approach. Wang et al. explore using sensor armbands to assess the users compliance with given hand washing hygiene guidelines @wang_accurate_2020. They run a classifier using XGBoost and are mostly able to separate the different steps of the scripted hand washing routine.
Added to that, Cao et al. @cao_awash_2021 developed a system that similarly detects different steps of a scripted hand washing routine and prompts the user, if they confuse the order of the steps or forget one of the steps. The technology is aimed at elderly patients with dementia. Their system is able to detect which step of hand washing is currently conducted based on wrist motion data using an LSTM based neural network. However, none of the three systems mentioned in this paragraph are meant to separate hand washing from other activities. Added to that, Cao et al. @cao_awash_2021 developed a system that similarly detects different steps of a scripted hand washing routine and prompts the user, if they confuse the order of the steps or forget one of the steps. The technology is aimed at elderly patients with dementia. Their system is able to detect which step of hand washing is currently conducted based on wrist motion data using an LSTM based neural network. However, none of the three systems mentioned in this paragraph are meant to separate hand washing from other activities.
Mondol et al. employ a simple feed forward neural network consisting of a few linear layers to detect hand washing @sayeed_mondol_hawad_2020. Their method seeks to specifically eliminate false positives by trying to detect out of distribution (OOD) samples, i.e. samples that are very different from the ones seen by the model during training. They apply a conditional Gaussian distribution of the network's features of the last layer before the output layer (penultimate layer). They use the said features of all positive class samples to calculate the mean $\boldsymbol{\mu}$ and covariance matrix $\mathbf{S}$ of the feature distribution. Based on these measures, one can compute each samples distance to the distribution using the Mahalanobis distance (as seen in equation \ref{eqn:mahala}). If during test time, the model predicts a sample to belong to the positive class, the distance is calculated. If the distance is bigger than a threshold ($d_{th}$), the sample is classified as a negative. The threshold $d_{th}$ can be derived by selecting it fittingly in order to include almost all positive samples seen during training. On their own data set (HAWAD data set) they reach F1-Scores of over 90% for hand washing detection. Mondol et al. employ a simple feed forward neural network consisting of a few linear layers to detect hand washing @sayeed_mondol_hawad_2020. Their method seeks to specifically eliminate false positives by trying to detect out of distribution (OOD) samples, i.e. samples that are very different from the ones seen by the model during training. They apply a conditional Gaussian distribution of the network's features of the last layer before the output layer (penultimate layer). They use the said features of all positive class samples to calculate the mean $\boldsymbol{\mu}$ and covariance matrix $\mathbf{S}$ of the feature distribution. Based on these measures, one can compute each sample's distance to the distribution using the Mahalanobis distance (as seen in equation \ref{eqn:mahala}). If during test time, the model predicts a sample to belong to the positive class, the distance is calculated. If the distance is bigger than a threshold ($d_{th}$), the sample is classified as a negative. The threshold $d_{th}$ can be derived by selecting it fittingly in order to include almost all positive samples seen during training. On their own data set (HAWAD data set) they reach F1-Scores of over 90% for hand washing detection.
\begin{figure} \begin{figure}
\begin{align} \begin{align}
......
No preview for this file type
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment