wörk
parent
a858231513
commit
1ab6a5a003
|
|
@ -133,26 +133,6 @@ compress
|
|||
\image{.85\textwidth}{grafana}{Side project: Weather station with Grafana}{img:grafana}
|
||||
\end{frame}
|
||||
|
||||
\begin{frame}{Architecture}
|
||||
\begin{itemize}
|
||||
\item Based on map-reduce
|
||||
\item Map: Analysis
|
||||
\begin{itemize}
|
||||
\item Iterate Log entries
|
||||
\item Feed log entry through analyzer queue
|
||||
\begin{itemize}
|
||||
\item Augment entries
|
||||
\item Filter entries
|
||||
\item Sequential order
|
||||
\end{itemize}
|
||||
\end{itemize}
|
||||
\item Reduce: Collect summaries from analyzers
|
||||
\begin{itemize}
|
||||
\item Post-processing, Comparison, …
|
||||
\item Rendering
|
||||
\end{itemize}
|
||||
\end{itemize}
|
||||
\end{frame}
|
||||
\begin{frame}{Architecture scheme}
|
||||
\image{\textwidth}{../../ThesTeX/images/map-reduce.pdf}{Data flows}{img:flow}
|
||||
\end{frame}
|
||||
|
|
@ -229,6 +209,15 @@ compress
|
|||
%TODO
|
||||
|
||||
|
||||
\begin{frame}{Evaluation}
|
||||
\begin{itemize}
|
||||
\item Analyse other geogames
|
||||
\item Describe effort
|
||||
\item ?
|
||||
\item Profit
|
||||
\end{itemize}
|
||||
\end{frame}
|
||||
|
||||
\begin{frame}{Evaluation}
|
||||
\begin{itemize}
|
||||
\item Analyse other geogames
|
||||
|
|
@ -253,6 +242,28 @@ compress
|
|||
|
||||
\appendix
|
||||
\backupbegin
|
||||
|
||||
\begin{frame}{Architecture}
|
||||
\begin{itemize}
|
||||
\item Based on map-reduce
|
||||
\item Map: Analysis
|
||||
\begin{itemize}
|
||||
\item Iterate Log entries
|
||||
\item Feed log entry through analyzer queue
|
||||
\begin{itemize}
|
||||
\item Augment entries
|
||||
\item Filter entries
|
||||
\item Sequential order
|
||||
\end{itemize}
|
||||
\end{itemize}
|
||||
\item Reduce: Collect summaries from analyzers
|
||||
\begin{itemize}
|
||||
\item Post-processing, Comparison, …
|
||||
\item Rendering
|
||||
\end{itemize}
|
||||
\end{itemize}
|
||||
\end{frame}
|
||||
|
||||
\begin{frame}{Graphs}
|
||||
\begin{columns}
|
||||
\column{0.45\linewidth}
|
||||
|
|
|
|||
|
|
@ -6,7 +6,7 @@ System administrators and developers face a daily surge of log files from applic
|
|||
For knowledge extraction, a wide range of tools is in constant development for such environments.
|
||||
Currently, an architectural approach with three main components is most frequently applied.
|
||||
This components are divided into aggregation \& creation, storage, and analysis \& frontend.
|
||||
A popular example is the ELK stack consisting of Elastic Search, Logstash, and Kibana \cite{andreassen2015monitoring,yang2016aggregated,steinegger2016analyse,sanjappa2017analysis}.
|
||||
A popular example is the ELK stack consisting of Elastic Search, Logstash, and Kibana \cite{andreassen2015monitoring,yang2016aggregated,steinegger2016analyse,sanjappa2017analysis}. \nomenclature{\m{E}lasticSearch, \m{L}ogstash, and \m{K}ibana}{ELK}
|
||||
In \autoref{tab:logs} some implementations of these components are listed according to the main focus.
|
||||
For this list, cloud-based services were not taken into account.
|
||||
A clear classification is not always possible, as some modules integrate virtually all features necessary, as is the case with the Graphite tool set.
|
||||
|
|
@ -14,11 +14,11 @@ A clear classification is not always possible, as some modules integrate virtual
|
|||
\begin{longtable}[H]{cp{0.2\textwidth}p{0.2\textwidth}}
|
||||
Collection & Database & Frontend\\
|
||||
\hline
|
||||
Logstash\footnote{\url{https://www.elastic.co/de/products/logstash}} & Elatisc Search\footnote{\url{https://www.elastic.co/de/products/elasticsearch}} & Kibana\footnote{\url{https://www.elastic.co/de/products/kibana}}\\
|
||||
Collectd\footnote{\url{https://collectd.org/}} & Influx DB\footnote{\url{https://www.influxdata.com/}} & Grafana\footnote{\url{https://grafana.com}}\\
|
||||
Icinga\footnote{\url{https://www.icinga.com/products/icinga-2/}} & Whisper\footnote{\url{https://github.com/graphite-project/whisper}} & Graphite\footnote{\url{https://graphiteapp.org/}}\\
|
||||
StatsD\footnote{\url{https://github.com/etsy/statsd}} & Prometheus\footnote{\url{https://prometheus.io/}} & \\
|
||||
%\footnote{\url{}} & \footnote{\url{}} & \footnote{\url{}}\\
|
||||
Logstash\furl{https://www.elastic.co/de/products/logstash} & Elatisc Search\furl{https://www.elastic.co/de/products/elasticsearch} & Kibana\furl{https://www.elastic.co/de/products/kibana}\\
|
||||
Collectd\furl{https://collectd.org/} & Influx DB\furl{https://www.influxdata.com/} & Grafana\furl{https://grafana.com}\\
|
||||
Icinga\furl{https://www.icinga.com/products/icinga-2/} & Whisper\furl{https://github.com/graphite-project/whisper} & Graphite\furl{https://graphiteapp.org/}\\
|
||||
StatsD\furl{https://github.com/etsy/statsd} & Prometheus\furl{https://prometheus.io/} & \\
|
||||
%\furl{} & \furl{} & \furl{}\\
|
||||
|
||||
\caption{Log processing components}
|
||||
\label{tab:logs}
|
||||
|
|
@ -26,64 +26,59 @@ StatsD\footnote{\url{https://github.com/etsy/statsd}} & Prometheus\footnote{\url
|
|||
|
||||
\subsubsection{Collection}
|
||||
Nearly all services designed for log collection offer multiple interfaces for submitting log data.
|
||||
By way of illustration, Logstash features a long list of input plugins from streaming files over an HTTP API to proprietary vendor sources like Amazon Web Services (AWS)\footnote{\url{https://www.elastic.co/guide/en/logstash/current/input-plugins.html}}. \nomenclature{\m{A}mazon \m{W}eb \m{S}ervices}{AWS}
|
||||
By way of illustration, Logstash features a long list of input plugins from streaming files over an HTTP API to proprietary vendor sources like Amazon Web Services (AWS)\furl{https://www.elastic.co/guide/en/logstash/current/input-plugins.html}. \nomenclature{\m{A}mazon \m{W}eb \m{S}ervices}{AWS} \nomenclature{\m{A}pplication \m{P}rogramming \m{I}nterface}{API}\nomenclature{\m{H}yper\m{t}ext \m{T}ransport \m{P}rotocol}{HTTP}
|
||||
|
||||
Aside from aggreation, the topic of log creation is covered from host-based monitoring solutions like Icinga to application centric approaches with e.g. StatsD embedded in the application source code.
|
||||
Aside from aggreation, the topic of log creation is covered from host-based monitoring solutions like Icinga to application centric approaches with e.g. StatsD embedded in the application source code\furl{https://thenewstack.io/collecting-metrics-using-statsd-a-standard-for-real-time-monitoring/}.
|
||||
|
||||
\subsubsection{Databases}
|
||||
The key component for a log processing system is the storage.
|
||||
While relational database management systems (RDBMS) \nomenclature{\m{R}elational \m{D}ata\m{b}ase \m{M}anagement \m{S}ystem}{RDBMS} can be suitable for small-scale solutions, the temporal order of events impose many pitfalls.
|
||||
For instance, django-monit-collector\footnote{\url{https://github.com/nleng/django-monit-collector}} as open alternative to the proprietary MMonit cloud service\footnote{\url{https://mmonit.com/monit/\#mmonit}} assures temporal coherence through lists of timestamps and measurement values stored as JSON strings in a RDBMS. \nomenclature{\m{J}ava\m{s}cript \m{O}bject \m{N}otation}{JSON}
|
||||
For instance, django-monit-collector\furl{https://github.com/nleng/django-monit-collector} as open alternative to the proprietary MMonit cloud service\furl{https://mmonit.com/monit/\#mmonit} assures temporal coherence through lists of timestamps and measurement values stored as JSON strings in a RDBMS. \nomenclature{\m{J}ava\m{s}cript \m{O}bject \m{N}otation}{JSON}
|
||||
This strategy forces the RDBMS and the application to deal with growing amounts of data, as no temporal selection can be performed by the RDBMS itself.
|
||||
During the evaluation in \cite{grossmann2017monitoring}, this phenomena rendered the browser-based visualization basically useless and impeded the access with statistical tools significantly.
|
||||
|
||||
Time Series Databases (TSDB) are specialized on chronological events.
|
||||
%TODO
|
||||
%TODO RRD
|
||||
With a focus on chronological events, Time Series Databases (TSDB) are commonly used in these scenarios. \nomenclature{\m{T}ime \m{S}eries \m{D}ata\m{b}ase}{TSDB}
|
||||
|
||||
One typical use is in monitoring, e.g. server health/usage statistics, or weather stations, like the example \autoref{img:rdd} shows.
|
||||
This example utilizes one of the early TSDB systems, RDDtool\furl{https://oss.oetiker.ch/rrdtool/index.en.html}.
|
||||
More recently, alternatives written in modern languages are popular, like InfluxDB\furl{https://www.influxdata.com/} on Go\furl{https://golang.org/} or Whisper on Python (from the Graphite software package).
|
||||
\image{\textwidth}{mgroth}{Weather station plot with RDDtool \cite{RDD}}{img:rdd}
|
||||
\nomenclature{\m{T}ime \m{S}eries \m{D}ata\m{b}ase}{TSDB}
|
||||
|
||||
\subsubsection{Frontend}
|
||||
|
||||
Frontends utilize the powerful query languages of the TSDB systems backing them.
|
||||
Grafana e.g. provides customizable dashboards with graphing and mapping support \cite{komarek2017metric}.
|
||||
Additional functionality can be added with plugins.
|
||||
%TODO
|
||||
Additional functionality can be added with plugins, e.g. for new data sources or dashboard panels with visualizations.
|
||||
The query languages of the data sources is abstracted by an common user interface.
|
||||
|
||||
%TODO: weather station screenshot
|
||||
|
||||
%%%
|
||||
\begin{itemize}
|
||||
\item ELK (Elastic search, Logstash, Kibana)\cite{andreassen2015monitoring} \cite{yang2016aggregated} \cite{steinegger2016analyse} \cite{sanjappa2017analysis}
|
||||
\item Collectd, Influx DB, Grafana \cite{komarek2017metric}
|
||||
\item …
|
||||
\end{itemize}
|
||||
\begin{itemize}
|
||||
\item[+] widely deployed
|
||||
\item[+] powerful query languages %TODO example
|
||||
\item mainly web/container/hardware monitoring
|
||||
\item[-] spatial analysis: heavily anonymized
|
||||
\item[-] fast-paced environment
|
||||
\end{itemize}
|
||||
|
||||
\subsection{Pedestrian traces}
|
||||
Analyzing pedestrian movement … based on GPS logs
|
||||
Analyzing pedestrian movement based on GPS logs is an established technique.
|
||||
In the following sections, \autoref{sssec:gps} provides an overview of GPS as data basis, \autoref{sssec:act} highlights some approaches to activity mining and \autoref{sssec:vis} showcases popular visualizations of tempo-spatial data.
|
||||
\nomenclature{\m{G}lobal \m{P}ositioning \m{S}ystem}{GPS}
|
||||
|
||||
\subsubsection{Data basis: GPS}
|
||||
\subsubsection{Activity Mining}
|
||||
\subsubsection{Visualization}
|
||||
\begin{itemize}
|
||||
\item GPS overestimates systematically \cite{Ranacher_2015}
|
||||
\item GPS is a suitable instrument for spatio-temporal data\cite{van_der_Spek_2009}
|
||||
\item Activity mining \cite{Gong_2014}
|
||||
\begin{itemize}
|
||||
\item Speed-based Clustering \cite{ren2015mining}
|
||||
%\item \cite{Ferrante_2016} % closed access
|
||||
\item Machine Learning \cite{pattern_recog} %TODO
|
||||
\end{itemize}
|
||||
\item E.g.: Improve tourist management \cite{tourist_analysis2012}
|
||||
\end{itemize}
|
||||
\subsubsection{Data basis: GPS}\label{sssec:gps}
|
||||
Global navigation satellite systems (GNSS) like GPS, Galileo, GLONASS, or BeiDou are a source of positioning data for mobile users.
|
||||
\nomenclature{\m{G}lobal \m{N}avigation \m{S}atellite \m{S}ystems}{GNSS}
|
||||
\cite{van_der_Spek_2009} has shown that such signals provide a reliable service in many situations.
|
||||
Additionally, tracks of these signals are a invaluable source of information for researching movements and movement patterns. \cite{Modsching:2008:1098-3058:31,nielsen2004gps,millonig2007monitoring}
|
||||
Therefore, GNSS are suitable instruments for acquiring spatio-temporal data \cite{van_der_Spek_2009}.
|
||||
|
||||
However, \cite{Ranacher_2015} reminds of systematical overestimates by GPS due to interpolation errors.
|
||||
To eliminate such biases of one system, \cite{Li2015} describes the combination of multiple GNSS for improved accuracy and reduced convergence time.
|
||||
|
||||
\subsubsection{Activity Mining}\label{sssec:act}
|
||||
GPS (or GNSS) tracks generally only contain the raw tempo-spatial data (possibly accompanied by metadata like accuracy, visible satellites, etc.).
|
||||
Any additional information needs either be logged seperately or needs to be derived from the track data itself.
|
||||
This activity mining allows e.g. the determination of the modes of transport used while creating the track \cite{Gong_2014}.
|
||||
\cite{Gong_2015} shows the extraction of activity stop locations to identify locations where locomotion suspends for an activity in contrast to stops without activities.
|
||||
Informations of this kind are relevant e.g. for improvements for tourist management in popular destinations \cite{tourist_analysis2012,koshak2008analyzing,Modsching:2008:1098-3058:31}.
|
||||
|
||||
Beside points of interest (POIs), individual behaviour patterns can be mined from tracks, as described in \cite{ren2015mining}.
|
||||
Post-processing of these patterns with machine learning enables predictions of future trajectories \cite{10.1007/978-3-642-23199-5_37}.
|
||||
|
||||
|
||||
\subsubsection{Visualization}\label{sssec:vis}
|
||||
|
||||
\image{.81\textwidth}{../../PresTeX/images/strava}{Heatmap: Fitnesstracker\cite{strava}}{img:strava}
|
||||
|
||||
|
|
|
|||
|
|
@ -496,3 +496,42 @@ keywords = "Games, Agent based models, Simulations, Analytics"
|
|||
year={2017},
|
||||
organization={IEEE}
|
||||
}
|
||||
@misc{RDD,
|
||||
title={{RDD galley example}},
|
||||
year={2011},
|
||||
month={7},
|
||||
url={https://oss.oetiker.ch/rrdtool/gallery/index.en.html}
|
||||
}
|
||||
@Article{Li2015,
|
||||
author="Li, Xingxing
|
||||
and Ge, Maorong
|
||||
and Dai, Xiaolei
|
||||
and Ren, Xiaodong
|
||||
and Fritsche, Mathias
|
||||
and Wickert, Jens
|
||||
and Schuh, Harald",
|
||||
title="Accuracy and reliability of multi-GNSS real-time precise positioning: GPS, GLONASS, BeiDou, and Galileo",
|
||||
journal="Journal of Geodesy",
|
||||
year="2015",
|
||||
month="Jun",
|
||||
day="01",
|
||||
volume="89",
|
||||
number="6",
|
||||
pages="607--635",
|
||||
issn="1432-1394",
|
||||
doi="10.1007/s00190-015-0802-8",
|
||||
url="https://doi.org/10.1007/s00190-015-0802-8"
|
||||
}
|
||||
@InProceedings{10.1007/978-3-642-23199-5_37,
|
||||
author="Chen, Chun-Sheng
|
||||
and Eick, Christoph F.
|
||||
and Rizk, Nouhad J.",
|
||||
editor="Perner, Petra",
|
||||
title="Mining Spatial Trajectories Using Non-parametric Density Functions",
|
||||
booktitle="Machine Learning and Data Mining in Pattern Recognition",
|
||||
year="2011",
|
||||
publisher="Springer Berlin Heidelberg",
|
||||
address="Berlin, Heidelberg",
|
||||
pages="496--510",
|
||||
isbn="978-3-642-23199-5"
|
||||
}
|
||||
|
|
|
|||
Binary file not shown.
|
After Width: | Height: | Size: 53 KiB |
Loading…
Reference in New Issue