thesis/ThesTeX/content/2.1-text.tex

111 lines
8.5 KiB
TeX

\section{State of research}
\subsection{Log processing}
System administrators and developers face a daily surge of log files from applications, systems, and servers.
For knowledge extraction, a wide range of tools is in constant development for such environments.
Currently, an architectural approach with three main components is most frequently applied.
This components are divided into aggregation \& creation, storage, and analysis \& frontend.
A popular example is the ELK stack consisting of Elastic Search, Logstash, and Kibana \cite{andreassen2015monitoring,yang2016aggregated,steinegger2016analyse,sanjappa2017analysis}. \nomenclature{\m{E}lasticSearch, \m{L}ogstash, and \m{K}ibana}{ELK}
In \autoref{tab:logs} some implementations of these components are listed according to the main focus.
For this list, cloud-based services were not taken into account.
A clear classification is not always possible, as some modules integrate virtually all features necessary, as is the case with the Graphite tool set.
\begin{longtable}[H]{cp{0.2\textwidth}p{0.2\textwidth}}
Collection & Database & Frontend\\
\hline
Logstash\furl{https://www.elastic.co/de/products/logstash} & Elatisc Search\furl{https://www.elastic.co/de/products/elasticsearch} & Kibana\furl{https://www.elastic.co/de/products/kibana}\\
Collectd\furl{https://collectd.org/} & Influx DB\furl{https://www.influxdata.com/} & Grafana\furl{https://grafana.com}\\
Icinga\furl{https://www.icinga.com/products/icinga-2/} & Whisper\furl{https://github.com/graphite-project/whisper} & Graphite\furl{https://graphiteapp.org/}\\
StatsD\furl{https://github.com/etsy/statsd} & Prometheus\furl{https://prometheus.io/} & \\
%\furl{} & \furl{} & \furl{}\\
\caption{Log processing components}
\label{tab:logs}
\end{longtable}
\subsubsection{Collection}
Nearly all services designed for log collection offer multiple interfaces for submitting log data.
By way of illustration, Logstash features a long list of input plugins from streaming files over an HTTP API to proprietary vendor sources like Amazon Web Services (AWS)\furl{https://www.elastic.co/guide/en/logstash/current/input-plugins.html}. \nomenclature{\m{A}mazon \m{W}eb \m{S}ervices}{AWS} \nomenclature{\m{A}pplication \m{P}rogramming \m{I}nterface}{API}\nomenclature{\m{H}yper\m{t}ext \m{T}ransport \m{P}rotocol}{HTTP}
Aside from aggreation, the topic of log creation is covered from host-based monitoring solutions like Icinga to application centric approaches with e.g. StatsD embedded in the application source code\furl{https://thenewstack.io/collecting-metrics-using-statsd-a-standard-for-real-time-monitoring/}.
\subsubsection{Databases}
The key component for a log processing system is the storage.
While relational database management systems (RDBMS) \nomenclature{\m{R}elational \m{D}ata\m{b}ase \m{M}anagement \m{S}ystem}{RDBMS} can be suitable for small-scale solutions, the temporal order of events impose many pitfalls.
For instance, django-monit-collector\furl{https://github.com/nleng/django-monit-collector} as open alternative to the proprietary MMonit cloud service\furl{https://mmonit.com/monit/\#mmonit} assures temporal coherence through lists of timestamps and measurement values stored as JSON strings in a RDBMS. \nomenclature{\m{J}ava\m{s}cript \m{O}bject \m{N}otation}{JSON}
This strategy forces the RDBMS and the application to deal with growing amounts of data, as no temporal selection can be performed by the RDBMS itself.
During the evaluation in \cite{grossmann2017monitoring}, this phenomena rendered the browser-based visualization basically useless and impeded the access with statistical tools significantly.
Time Series Databases (TSDB) are specialized on chronological events.
One typical use is in monitoring, e.g. server health/usage statistics, or weather stations, like the example \autoref{img:rdd} shows.
This example utilizes one of the early TSDB systems, RDDtool\furl{https://oss.oetiker.ch/rrdtool/index.en.html}.
More recently, alternatives written in modern languages are popular, like InfluxDB\furl{https://www.influxdata.com/} on Go\furl{https://golang.org/} or Whisper on Python (from the Graphite software package).
\image{\textwidth}{mgroth}{Weather station plot with RDDtool \cite{RDD}}{img:rdd}
\nomenclature{\m{T}ime \m{S}eries \m{D}ata\m{b}ase}{TSDB}
\subsubsection{Frontend}
Frontends utilize the powerful query languages of the TSDB systems backing them.
Grafana e.g. provides customizable dashboards with graphing and mapping support \cite{komarek2017metric}.
Additional functionality can be added with plugins, e.g. for new data sources or dashboard panels with visualizations.
The query languages of the data sources is abstracted by an common user interface.
\subsection{Pedestrian traces}
Analyzing pedestrian movement based on GPS logs is an established technique.
In the following sections, \autoref{sssec:gps} provides an overview of GPS as data basis, \autoref{sssec:act} highlights some approaches to activity mining and \autoref{sssec:vis} showcases popular visualizations of tempo-spatial data.
\nomenclature{\m{G}lobal \m{P}ositioning \m{S}ystem}{GPS}
\subsubsection{Data basis: GPS}\label{sssec:gps}
Global navigation satellite systems (GNSS) like GPS, Galileo, GLONASS, or BeiDou are a source of positioning data for mobile users.
\nomenclature{\m{G}lobal \m{N}avigation \m{S}atellite \m{S}ystems}{GNSS}
\cite{van_der_Spek_2009} has shown that such signals provide a reliable service in many situations.
Additionally, tracks of these signals are a invaluable source of information for researching movements and movement patterns. \cite{Modsching:2008:1098-3058:31,nielsen2004gps,millonig2007monitoring}
Therefore, GNSS are suitable instruments for acquiring spatio-temporal data \cite{van_der_Spek_2009}.
However, \cite{Ranacher_2015} reminds of systematical overestimates by GPS due to interpolation errors.
To eliminate such biases of one system, \cite{Li2015} describes the combination of multiple GNSS for improved accuracy and reduced convergence time.
\subsubsection{Activity Mining}\label{sssec:act}
GPS (or GNSS) tracks generally only contain the raw tempo-spatial data (possibly accompanied by metadata like accuracy, visible satellites, etc.).
Any additional information needs either be logged seperately or needs to be derived from the track data itself.
This activity mining allows e.g. the determination of the modes of transport used while creating the track \cite{Gong_2014}.
\cite{Gong_2015} shows the extraction of activity stop locations to identify locations where locomotion suspends for an activity in contrast to stops without activities.
Informations of this kind are relevant e.g. for improvements for tourist management in popular destinations \cite{tourist_analysis2012,koshak2008analyzing,Modsching:2008:1098-3058:31}.
Beside points of interest (POIs), individual behaviour patterns can be mined from tracks, as described in \cite{ren2015mining}.
Post-processing of these patterns with machine learning enables predictions of future trajectories \cite{10.1007/978-3-642-23199-5_37}.
\subsubsection{Visualization}\label{sssec:vis}
\image{.81\textwidth}{../../PresTeX/images/strava}{Heatmap: Fitnesstracker\cite{strava}}{img:strava}
\image{.72\textwidth}{../../PresTeX/images/space-time}{Space-time cube examples\cite{bach2014review}}{img:spacetime}
\image{\textwidth}{../../PresTeX/images/traj-pattern}{Flock and meet trajectory pattern\cite{jeung2011trajectory}}{img:traj-pattern}
\image{\textwidth}{../../PresTeX/images/generalization}{Trajectories and generalizations with varying radius parameter \cite{adrienko2011spatial}}{img:generalization}
\subsection{Analyzing games}
\begin{itemize}
\item there's more than heatmaps
\item combine position with game actions
\item identify patterns, balancing issues
\item manual processes %\citetitle{Drachen2013}\citetitle{AHLQVIST20181}
\end{itemize}
%\image{.5\textwidth}{game-an}{chat logs with players location \cite{Drachen2013}}{img:chatlogs}
%\image{.5\textwidth}{ac3-death}{identify critical sections \cite{Drachen2013}}{img:ac3death}
\twofigures{0.5}{../../PresTeX/images/game-an}{Chat logs with players location}{img:chatlogs}{../../PresTeX/images/ac3-death}{Identify critical sections}{img:ac3death}{Game analytics \cite{Drachen2013}}{fig:gameanal}
\subsection{Summary}
\begin{itemize}
\item Log processing: Powerful stacks
\item Movement analysis: Large field already explored (GPS influence, Patterns, Behavior recognition, …)
\item Track rendering: Track (with attributes), Space-time cube, Heatmap, …
\item Spatial analysis of digital games with GIS
\item Analysis of location based games: Laborious manual process
\end{itemize}