111 lines
8.5 KiB
TeX
111 lines
8.5 KiB
TeX
\section{State of research}
|
|
|
|
\subsection{Log processing}
|
|
System administrators and developers face a daily surge of log files from applications, systems, and servers.
|
|
For knowledge extraction, a wide range of tools is in constant development for such environments.
|
|
Currently, an architectural approach with three main components is most frequently applied.
|
|
This components are divided into aggregation \& creation, storage, and analysis \& frontend.
|
|
A popular example is the ELK stack consisting of Elastic Search, Logstash, and Kibana \cite{andreassen2015monitoring,yang2016aggregated,steinegger2016analyse,sanjappa2017analysis}. \nomenclature{\m{E}lasticSearch, \m{L}ogstash, and \m{K}ibana}{ELK}
|
|
In \autoref{tab:logs} some implementations of these components are listed according to the main focus.
|
|
For this list, cloud-based services were not taken into account.
|
|
A clear classification is not always possible, as some modules integrate virtually all features necessary, as is the case with the Graphite tool set.
|
|
|
|
\begin{longtable}[H]{cp{0.2\textwidth}p{0.2\textwidth}}
|
|
Collection & Database & Frontend\\
|
|
\hline
|
|
Logstash\furl{https://www.elastic.co/de/products/logstash} & Elatisc Search\furl{https://www.elastic.co/de/products/elasticsearch} & Kibana\furl{https://www.elastic.co/de/products/kibana}\\
|
|
Collectd\furl{https://collectd.org/} & Influx DB\furl{https://www.influxdata.com/} & Grafana\furl{https://grafana.com}\\
|
|
Icinga\furl{https://www.icinga.com/products/icinga-2/} & Whisper\furl{https://github.com/graphite-project/whisper} & Graphite\furl{https://graphiteapp.org/}\\
|
|
StatsD\furl{https://github.com/etsy/statsd} & Prometheus\furl{https://prometheus.io/} & \\
|
|
%\furl{} & \furl{} & \furl{}\\
|
|
|
|
\caption{Log processing components}
|
|
\label{tab:logs}
|
|
\end{longtable}
|
|
|
|
\subsubsection{Collection}
|
|
Nearly all services designed for log collection offer multiple interfaces for submitting log data.
|
|
By way of illustration, Logstash features a long list of input plugins from streaming files over an HTTP API to proprietary vendor sources like Amazon Web Services (AWS)\furl{https://www.elastic.co/guide/en/logstash/current/input-plugins.html}. \nomenclature{\m{A}mazon \m{W}eb \m{S}ervices}{AWS} \nomenclature{\m{A}pplication \m{P}rogramming \m{I}nterface}{API}\nomenclature{\m{H}yper\m{t}ext \m{T}ransport \m{P}rotocol}{HTTP}
|
|
|
|
Aside from aggreation, the topic of log creation is covered from host-based monitoring solutions like Icinga to application centric approaches with e.g. StatsD embedded in the application source code\furl{https://thenewstack.io/collecting-metrics-using-statsd-a-standard-for-real-time-monitoring/}.
|
|
|
|
\subsubsection{Databases}
|
|
The key component for a log processing system is the storage.
|
|
While relational database management systems (RDBMS) \nomenclature{\m{R}elational \m{D}ata\m{b}ase \m{M}anagement \m{S}ystem}{RDBMS} can be suitable for small-scale solutions, the temporal order of events impose many pitfalls.
|
|
For instance, django-monit-collector\furl{https://github.com/nleng/django-monit-collector} as open alternative to the proprietary MMonit cloud service\furl{https://mmonit.com/monit/\#mmonit} assures temporal coherence through lists of timestamps and measurement values stored as JSON strings in a RDBMS. \nomenclature{\m{J}ava\m{s}cript \m{O}bject \m{N}otation}{JSON}
|
|
This strategy forces the RDBMS and the application to deal with growing amounts of data, as no temporal selection can be performed by the RDBMS itself.
|
|
During the evaluation in \cite{grossmann2017monitoring}, this phenomena rendered the browser-based visualization basically useless and impeded the access with statistical tools significantly.
|
|
|
|
Time Series Databases (TSDB) are specialized on chronological events.
|
|
One typical use is in monitoring, e.g. server health/usage statistics, or weather stations, like the example \autoref{img:rdd} shows.
|
|
This example utilizes one of the early TSDB systems, RDDtool\furl{https://oss.oetiker.ch/rrdtool/index.en.html}.
|
|
More recently, alternatives written in modern languages are popular, like InfluxDB\furl{https://www.influxdata.com/} on Go\furl{https://golang.org/} or Whisper on Python (from the Graphite software package).
|
|
\image{\textwidth}{mgroth}{Weather station plot with RDDtool \cite{RDD}}{img:rdd}
|
|
\nomenclature{\m{T}ime \m{S}eries \m{D}ata\m{b}ase}{TSDB}
|
|
|
|
\subsubsection{Frontend}
|
|
|
|
Frontends utilize the powerful query languages of the TSDB systems backing them.
|
|
Grafana e.g. provides customizable dashboards with graphing and mapping support \cite{komarek2017metric}.
|
|
Additional functionality can be added with plugins, e.g. for new data sources or dashboard panels with visualizations.
|
|
The query languages of the data sources is abstracted by an common user interface.
|
|
|
|
|
|
\subsection{Pedestrian traces}
|
|
Analyzing pedestrian movement based on GPS logs is an established technique.
|
|
In the following sections, \autoref{sssec:gps} provides an overview of GPS as data basis, \autoref{sssec:act} highlights some approaches to activity mining and \autoref{sssec:vis} showcases popular visualizations of tempo-spatial data.
|
|
\nomenclature{\m{G}lobal \m{P}ositioning \m{S}ystem}{GPS}
|
|
|
|
\subsubsection{Data basis: GPS}\label{sssec:gps}
|
|
Global navigation satellite systems (GNSS) like GPS, Galileo, GLONASS, or BeiDou are a source of positioning data for mobile users.
|
|
\nomenclature{\m{G}lobal \m{N}avigation \m{S}atellite \m{S}ystems}{GNSS}
|
|
\cite{van_der_Spek_2009} has shown that such signals provide a reliable service in many situations.
|
|
Additionally, tracks of these signals are a invaluable source of information for researching movements and movement patterns. \cite{Modsching:2008:1098-3058:31,nielsen2004gps,millonig2007monitoring}
|
|
Therefore, GNSS are suitable instruments for acquiring spatio-temporal data \cite{van_der_Spek_2009}.
|
|
|
|
However, \cite{Ranacher_2015} reminds of systematical overestimates by GPS due to interpolation errors.
|
|
To eliminate such biases of one system, \cite{Li2015} describes the combination of multiple GNSS for improved accuracy and reduced convergence time.
|
|
|
|
\subsubsection{Activity Mining}\label{sssec:act}
|
|
GPS (or GNSS) tracks generally only contain the raw tempo-spatial data (possibly accompanied by metadata like accuracy, visible satellites, etc.).
|
|
Any additional information needs either be logged seperately or needs to be derived from the track data itself.
|
|
This activity mining allows e.g. the determination of the modes of transport used while creating the track \cite{Gong_2014}.
|
|
\cite{Gong_2015} shows the extraction of activity stop locations to identify locations where locomotion suspends for an activity in contrast to stops without activities.
|
|
Informations of this kind are relevant e.g. for improvements for tourist management in popular destinations \cite{tourist_analysis2012,koshak2008analyzing,Modsching:2008:1098-3058:31}.
|
|
|
|
Beside points of interest (POIs), individual behaviour patterns can be mined from tracks, as described in \cite{ren2015mining}.
|
|
Post-processing of these patterns with machine learning enables predictions of future trajectories \cite{10.1007/978-3-642-23199-5_37}.
|
|
|
|
|
|
\subsubsection{Visualization}\label{sssec:vis}
|
|
|
|
\image{.81\textwidth}{../../PresTeX/images/strava}{Heatmap: Fitnesstracker\cite{strava}}{img:strava}
|
|
|
|
\image{.72\textwidth}{../../PresTeX/images/space-time}{Space-time cube examples\cite{bach2014review}}{img:spacetime}
|
|
|
|
\image{\textwidth}{../../PresTeX/images/traj-pattern}{Flock and meet trajectory pattern\cite{jeung2011trajectory}}{img:traj-pattern}
|
|
|
|
\image{\textwidth}{../../PresTeX/images/generalization}{Trajectories and generalizations with varying radius parameter \cite{adrienko2011spatial}}{img:generalization}
|
|
|
|
|
|
\subsection{Analyzing games}
|
|
\begin{itemize}
|
|
\item there's more than heatmaps
|
|
\item combine position with game actions
|
|
\item identify patterns, balancing issues
|
|
\item manual processes %\citetitle{Drachen2013}\citetitle{AHLQVIST20181}
|
|
\end{itemize}
|
|
%\image{.5\textwidth}{game-an}{chat logs with players location \cite{Drachen2013}}{img:chatlogs}
|
|
%\image{.5\textwidth}{ac3-death}{identify critical sections \cite{Drachen2013}}{img:ac3death}
|
|
\twofigures{0.5}{../../PresTeX/images/game-an}{Chat logs with players location}{img:chatlogs}{../../PresTeX/images/ac3-death}{Identify critical sections}{img:ac3death}{Game analytics \cite{Drachen2013}}{fig:gameanal}
|
|
|
|
|
|
|
|
\subsection{Summary}
|
|
\begin{itemize}
|
|
\item Log processing: Powerful stacks
|
|
\item Movement analysis: Large field already explored (GPS influence, Patterns, Behavior recognition, …)
|
|
\item Track rendering: Track (with attributes), Space-time cube, Heatmap, …
|
|
\item Spatial analysis of digital games with GIS
|
|
\item Analysis of location based games: Laborious manual process
|
|
\end{itemize} |