master
Clemens Klug 2018-05-09 14:14:49 +02:00
parent a858231513
commit 1ab6a5a003
4 changed files with 110 additions and 65 deletions

View File

@ -133,26 +133,6 @@ compress
\image{.85\textwidth}{grafana}{Side project: Weather station with Grafana}{img:grafana}
\end{frame}
\begin{frame}{Architecture}
\begin{itemize}
\item Based on map-reduce
\item Map: Analysis
\begin{itemize}
\item Iterate Log entries
\item Feed log entry through analyzer queue
\begin{itemize}
\item Augment entries
\item Filter entries
\item Sequential order
\end{itemize}
\end{itemize}
\item Reduce: Collect summaries from analyzers
\begin{itemize}
\item Post-processing, Comparison, …
\item Rendering
\end{itemize}
\end{itemize}
\end{frame}
\begin{frame}{Architecture scheme}
\image{\textwidth}{../../ThesTeX/images/map-reduce.pdf}{Data flows}{img:flow}
\end{frame}
@ -229,6 +209,15 @@ compress
%TODO
\begin{frame}{Evaluation}
\begin{itemize}
\item Analyse other geogames
\item Describe effort
\item ?
\item Profit
\end{itemize}
\end{frame}
\begin{frame}{Evaluation}
\begin{itemize}
\item Analyse other geogames
@ -253,6 +242,28 @@ compress
\appendix
\backupbegin
\begin{frame}{Architecture}
\begin{itemize}
\item Based on map-reduce
\item Map: Analysis
\begin{itemize}
\item Iterate Log entries
\item Feed log entry through analyzer queue
\begin{itemize}
\item Augment entries
\item Filter entries
\item Sequential order
\end{itemize}
\end{itemize}
\item Reduce: Collect summaries from analyzers
\begin{itemize}
\item Post-processing, Comparison, …
\item Rendering
\end{itemize}
\end{itemize}
\end{frame}
\begin{frame}{Graphs}
\begin{columns}
\column{0.45\linewidth}

View File

@ -6,7 +6,7 @@ System administrators and developers face a daily surge of log files from applic
For knowledge extraction, a wide range of tools is in constant development for such environments.
Currently, an architectural approach with three main components is most frequently applied.
This components are divided into aggregation \& creation, storage, and analysis \& frontend.
A popular example is the ELK stack consisting of Elastic Search, Logstash, and Kibana \cite{andreassen2015monitoring,yang2016aggregated,steinegger2016analyse,sanjappa2017analysis}.
A popular example is the ELK stack consisting of Elastic Search, Logstash, and Kibana \cite{andreassen2015monitoring,yang2016aggregated,steinegger2016analyse,sanjappa2017analysis}. \nomenclature{\m{E}lasticSearch, \m{L}ogstash, and \m{K}ibana}{ELK}
In \autoref{tab:logs} some implementations of these components are listed according to the main focus.
For this list, cloud-based services were not taken into account.
A clear classification is not always possible, as some modules integrate virtually all features necessary, as is the case with the Graphite tool set.
@ -14,11 +14,11 @@ A clear classification is not always possible, as some modules integrate virtual
\begin{longtable}[H]{cp{0.2\textwidth}p{0.2\textwidth}}
Collection & Database & Frontend\\
\hline
Logstash\footnote{\url{https://www.elastic.co/de/products/logstash}} & Elatisc Search\footnote{\url{https://www.elastic.co/de/products/elasticsearch}} & Kibana\footnote{\url{https://www.elastic.co/de/products/kibana}}\\
Collectd\footnote{\url{https://collectd.org/}} & Influx DB\footnote{\url{https://www.influxdata.com/}} & Grafana\footnote{\url{https://grafana.com}}\\
Icinga\footnote{\url{https://www.icinga.com/products/icinga-2/}} & Whisper\footnote{\url{https://github.com/graphite-project/whisper}} & Graphite\footnote{\url{https://graphiteapp.org/}}\\
StatsD\footnote{\url{https://github.com/etsy/statsd}} & Prometheus\footnote{\url{https://prometheus.io/}} & \\
%\footnote{\url{}} & \footnote{\url{}} & \footnote{\url{}}\\
Logstash\furl{https://www.elastic.co/de/products/logstash} & Elatisc Search\furl{https://www.elastic.co/de/products/elasticsearch} & Kibana\furl{https://www.elastic.co/de/products/kibana}\\
Collectd\furl{https://collectd.org/} & Influx DB\furl{https://www.influxdata.com/} & Grafana\furl{https://grafana.com}\\
Icinga\furl{https://www.icinga.com/products/icinga-2/} & Whisper\furl{https://github.com/graphite-project/whisper} & Graphite\furl{https://graphiteapp.org/}\\
StatsD\furl{https://github.com/etsy/statsd} & Prometheus\furl{https://prometheus.io/} & \\
%\furl{} & \furl{} & \furl{}\\
\caption{Log processing components}
\label{tab:logs}
@ -26,64 +26,59 @@ StatsD\footnote{\url{https://github.com/etsy/statsd}} & Prometheus\footnote{\url
\subsubsection{Collection}
Nearly all services designed for log collection offer multiple interfaces for submitting log data.
By way of illustration, Logstash features a long list of input plugins from streaming files over an HTTP API to proprietary vendor sources like Amazon Web Services (AWS)\footnote{\url{https://www.elastic.co/guide/en/logstash/current/input-plugins.html}}. \nomenclature{\m{A}mazon \m{W}eb \m{S}ervices}{AWS}
By way of illustration, Logstash features a long list of input plugins from streaming files over an HTTP API to proprietary vendor sources like Amazon Web Services (AWS)\furl{https://www.elastic.co/guide/en/logstash/current/input-plugins.html}. \nomenclature{\m{A}mazon \m{W}eb \m{S}ervices}{AWS} \nomenclature{\m{A}pplication \m{P}rogramming \m{I}nterface}{API}\nomenclature{\m{H}yper\m{t}ext \m{T}ransport \m{P}rotocol}{HTTP}
Aside from aggreation, the topic of log creation is covered from host-based monitoring solutions like Icinga to application centric approaches with e.g. StatsD embedded in the application source code.
Aside from aggreation, the topic of log creation is covered from host-based monitoring solutions like Icinga to application centric approaches with e.g. StatsD embedded in the application source code\furl{https://thenewstack.io/collecting-metrics-using-statsd-a-standard-for-real-time-monitoring/}.
\subsubsection{Databases}
The key component for a log processing system is the storage.
While relational database management systems (RDBMS) \nomenclature{\m{R}elational \m{D}ata\m{b}ase \m{M}anagement \m{S}ystem}{RDBMS} can be suitable for small-scale solutions, the temporal order of events impose many pitfalls.
For instance, django-monit-collector\footnote{\url{https://github.com/nleng/django-monit-collector}} as open alternative to the proprietary MMonit cloud service\footnote{\url{https://mmonit.com/monit/\#mmonit}} assures temporal coherence through lists of timestamps and measurement values stored as JSON strings in a RDBMS. \nomenclature{\m{J}ava\m{s}cript \m{O}bject \m{N}otation}{JSON}
For instance, django-monit-collector\furl{https://github.com/nleng/django-monit-collector} as open alternative to the proprietary MMonit cloud service\furl{https://mmonit.com/monit/\#mmonit} assures temporal coherence through lists of timestamps and measurement values stored as JSON strings in a RDBMS. \nomenclature{\m{J}ava\m{s}cript \m{O}bject \m{N}otation}{JSON}
This strategy forces the RDBMS and the application to deal with growing amounts of data, as no temporal selection can be performed by the RDBMS itself.
During the evaluation in \cite{grossmann2017monitoring}, this phenomena rendered the browser-based visualization basically useless and impeded the access with statistical tools significantly.
Time Series Databases (TSDB) are specialized on chronological events.
%TODO
%TODO RRD
With a focus on chronological events, Time Series Databases (TSDB) are commonly used in these scenarios. \nomenclature{\m{T}ime \m{S}eries \m{D}ata\m{b}ase}{TSDB}
One typical use is in monitoring, e.g. server health/usage statistics, or weather stations, like the example \autoref{img:rdd} shows.
This example utilizes one of the early TSDB systems, RDDtool\furl{https://oss.oetiker.ch/rrdtool/index.en.html}.
More recently, alternatives written in modern languages are popular, like InfluxDB\furl{https://www.influxdata.com/} on Go\furl{https://golang.org/} or Whisper on Python (from the Graphite software package).
\image{\textwidth}{mgroth}{Weather station plot with RDDtool \cite{RDD}}{img:rdd}
\nomenclature{\m{T}ime \m{S}eries \m{D}ata\m{b}ase}{TSDB}
\subsubsection{Frontend}
Frontends utilize the powerful query languages of the TSDB systems backing them.
Grafana e.g. provides customizable dashboards with graphing and mapping support \cite{komarek2017metric}.
Additional functionality can be added with plugins.
%TODO
Additional functionality can be added with plugins, e.g. for new data sources or dashboard panels with visualizations.
The query languages of the data sources is abstracted by an common user interface.
%TODO: weather station screenshot
%%%
\begin{itemize}
\item ELK (Elastic search, Logstash, Kibana)\cite{andreassen2015monitoring} \cite{yang2016aggregated} \cite{steinegger2016analyse} \cite{sanjappa2017analysis}
\item Collectd, Influx DB, Grafana \cite{komarek2017metric}
\item
\end{itemize}
\begin{itemize}
\item[+] widely deployed
\item[+] powerful query languages %TODO example
\item mainly web/container/hardware monitoring
\item[-] spatial analysis: heavily anonymized
\item[-] fast-paced environment
\end{itemize}
\subsection{Pedestrian traces}
Analyzing pedestrian movement … based on GPS logs
Analyzing pedestrian movement based on GPS logs is an established technique.
In the following sections, \autoref{sssec:gps} provides an overview of GPS as data basis, \autoref{sssec:act} highlights some approaches to activity mining and \autoref{sssec:vis} showcases popular visualizations of tempo-spatial data.
\nomenclature{\m{G}lobal \m{P}ositioning \m{S}ystem}{GPS}
\subsubsection{Data basis: GPS}
\subsubsection{Activity Mining}
\subsubsection{Visualization}
\begin{itemize}
\item GPS overestimates systematically \cite{Ranacher_2015}
\item GPS is a suitable instrument for spatio-temporal data\cite{van_der_Spek_2009}
\item Activity mining \cite{Gong_2014}
\begin{itemize}
\item Speed-based Clustering \cite{ren2015mining}
%\item \cite{Ferrante_2016} % closed access
\item Machine Learning \cite{pattern_recog} %TODO
\end{itemize}
\item E.g.: Improve tourist management \cite{tourist_analysis2012}
\end{itemize}
\subsubsection{Data basis: GPS}\label{sssec:gps}
Global navigation satellite systems (GNSS) like GPS, Galileo, GLONASS, or BeiDou are a source of positioning data for mobile users.
\nomenclature{\m{G}lobal \m{N}avigation \m{S}atellite \m{S}ystems}{GNSS}
\cite{van_der_Spek_2009} has shown that such signals provide a reliable service in many situations.
Additionally, tracks of these signals are a invaluable source of information for researching movements and movement patterns. \cite{Modsching:2008:1098-3058:31,nielsen2004gps,millonig2007monitoring}
Therefore, GNSS are suitable instruments for acquiring spatio-temporal data \cite{van_der_Spek_2009}.
However, \cite{Ranacher_2015} reminds of systematical overestimates by GPS due to interpolation errors.
To eliminate such biases of one system, \cite{Li2015} describes the combination of multiple GNSS for improved accuracy and reduced convergence time.
\subsubsection{Activity Mining}\label{sssec:act}
GPS (or GNSS) tracks generally only contain the raw tempo-spatial data (possibly accompanied by metadata like accuracy, visible satellites, etc.).
Any additional information needs either be logged seperately or needs to be derived from the track data itself.
This activity mining allows e.g. the determination of the modes of transport used while creating the track \cite{Gong_2014}.
\cite{Gong_2015} shows the extraction of activity stop locations to identify locations where locomotion suspends for an activity in contrast to stops without activities.
Informations of this kind are relevant e.g. for improvements for tourist management in popular destinations \cite{tourist_analysis2012,koshak2008analyzing,Modsching:2008:1098-3058:31}.
Beside points of interest (POIs), individual behaviour patterns can be mined from tracks, as described in \cite{ren2015mining}.
Post-processing of these patterns with machine learning enables predictions of future trajectories \cite{10.1007/978-3-642-23199-5_37}.
\subsubsection{Visualization}\label{sssec:vis}
\image{.81\textwidth}{../../PresTeX/images/strava}{Heatmap: Fitnesstracker\cite{strava}}{img:strava}

View File

@ -496,3 +496,42 @@ keywords = "Games, Agent based models, Simulations, Analytics"
year={2017},
organization={IEEE}
}
@misc{RDD,
title={{RDD galley example}},
year={2011},
month={7},
url={https://oss.oetiker.ch/rrdtool/gallery/index.en.html}
}
@Article{Li2015,
author="Li, Xingxing
and Ge, Maorong
and Dai, Xiaolei
and Ren, Xiaodong
and Fritsche, Mathias
and Wickert, Jens
and Schuh, Harald",
title="Accuracy and reliability of multi-GNSS real-time precise positioning: GPS, GLONASS, BeiDou, and Galileo",
journal="Journal of Geodesy",
year="2015",
month="Jun",
day="01",
volume="89",
number="6",
pages="607--635",
issn="1432-1394",
doi="10.1007/s00190-015-0802-8",
url="https://doi.org/10.1007/s00190-015-0802-8"
}
@InProceedings{10.1007/978-3-642-23199-5_37,
author="Chen, Chun-Sheng
and Eick, Christoph F.
and Rizk, Nouhad J.",
editor="Perner, Petra",
title="Mining Spatial Trajectories Using Non-parametric Density Functions",
booktitle="Machine Learning and Data Mining in Pattern Recognition",
year="2011",
publisher="Springer Berlin Heidelberg",
address="Berlin, Heidelberg",
pages="496--510",
isbn="978-3-642-23199-5"
}

BIN
ThesTeX/images/mgroth.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB