thesis/ThesTeX/content/3-solution.tex

174 lines
9.8 KiB
TeX

\section{Requirements}
Wait, what did I want to do again?
\begin{itemize}
\item Per-game importer (Web client, File loader, …)
\item Analyzer modules (number crunching)
\item Output \& Visualization (CSV, [Geo]JSON, KML, Graphs, …)
\item Interface (Configuration)
\begin{itemize}
\item Expert users/researchers
\item Staging/designing staff
\end{itemize}
\item Cross-game comparisons
\item Integration of external data (questionnaire results)
\end{itemize}
\section{Evaluating Kibana}
%TODO
\subsection{Evaluating Kibana}
To evaluate whether Kibana is a viable approach for the given requirements, I have created a test environment.
This setup is documented in \autoref{app:kibana}.
Two sample datasets were loaded into the Elasticsearch container through HTTP POST requests: \texttt{curl -H 'Content-Type: application/x-ndjson' -XPOST 'elastic:9200/\_bulk?pretty' --data-binary @gamelog.json}.
Once Kibana was told which fields hold the spatial information, it is possible to have a first visualization.
However, this view is optimized for the context of web log processing, so it has a rather low spatial resolution as shown in \autoref{img:kibana} and \autoref{img:kibana2}.
Additionally, the query language restricts the possible research questions the solution can resolve.
This means only the questions expressable in the query language can be answered.
Additionally, this requires the users to master the query language before any resonable conclusions can be extracted.
By building a custom plugin, extension, or modified version, it is possible to circumvent this obstacle.
However, the fast-paced environment of the industry either requires a constant effort of keeping pace, or results in an outdated system rather quickly. (E.g. the next major release Kibana v6.0.0\footnote{\url{https://github.com/elastic/kibana/releases/tag/v6.0.0}} was released about a year after Kibana v5.0.0\footnote{\url{https://github.com/elastic/kibana/releases/tag/v5.0.0}}. However, the previous major version seems to receive updates for about an year, too.)
\image{\textwidth}{../../PresTeX/images/kibana}{Game trace in Kibana}{img:kibana}
\image{\textwidth}{../../PresTeX/images/kibana2}{Game trace in Kibana}{img:kibana2}
\subsection{Evaluation Grafana}
Grafana is a solution to analyze, explore, and visualize various source of time series data.
There exist plugins for nearly any storage and collection backend for metrics\furl{https://grafana.com/plugins?type=datasource}.
The different backends are available through a unified user interface shown in \autoref{img:grafana}.
Spatial resolution suffers under the same conditions compared to Kibana, with an even lower resolution. %TODO: bild
\image{\textwidth}{grafana-metrics}{Configuring a graph in Grafana}{img:grafana}
\subsection{Conclusion}
After all, the monitoring solutions are no perfect match for this special use case.
The privacy concerns vital in web monitoring prohibit detailed spatial analyzes, the query languages can restrict some questions, and custom extensions require constant integration effort.
Regarding the specified use cases, expecially the non-expert users benefit from a simple to use interface.
The default Kibana worchbench does not qualify for this, a custom interface could improve the situation.
Grafana does have support for shared dashboards with a fixed set of data, %TODO
\section{Architectural Design}
\subsection{Overview}
While the development of a custom stack requires a lot of infrastructural work to get the project running, the learnings above give points to build a custom solution as a feasible alternative:
\begin{itemize}
\item Developing from buttom-up takes less time than diving into complex turn-key monitoring solutions.
\item With rather limited amounts of data\footnote{From a sample of 436 game logs from BioDiv2go, an average log file is 800 kB in size, with a median of 702 kB}, scalable solutions are no hard requirement
\item No core dependecies on fast-paced projects
\item Interfaces tailored on requirements: Simple web interface for non-expert users, CLI and API for researchers with unrestricted possibilities.
\item A focus on key points allows simple, easily extendable interfaces and implementations.
\item Reducing the complexity to an overseeable level, the processes and results can be verified for accuracy and reliability.
\end{itemize}
With the requirements from \autoref{sec:require} and the learnings from log processing evaluations in mind, a first architectural approach is visualized in \autoref{img:solution}.
It outlines three main components of the project: Two user facing services (Web \& CLI / API), and an analysis framework.
The interfaces (Web and CLI/API) for both target groups (see \autoref{sec:require}) are completely dependent on the analysis framework at the core.
\image{\textwidth}{solution.pdf}{Architecture approach}{img:solution}
The following sections describe each of those components.
\subsection{Analysis Framework}
The analysis framework takes game logs, processes their entries, collects results, and renders them to an output.
With a Map-Reduce pattern as basic structure for the data flow, an ordered collection of analyzing, matching prostprocessing and render operations defines an analysis run.
\autoref{img:flow} shows the data flows through the framework.
Every processed log file has its own analysis chain.
The log entries are fed sequentially into the analysis chain.
\image{\textwidth}{map-reduce.pdf}{Data flows}{img:flow}
\subsubsection{Analyzer}
An Analyzer takes one log entry at a time and processes it.
With dynamic selectors stored in settings, Analyzers can be used on multiple game types.
For specific needs, Analyzers can tailored to a specific game, too.
While processing, the Analyzer can choose to read, manipulate, or consume the log entry.
\paragraph{Reading a log entry}
Every Analyzer can read all of the log entry's contents.
This is obviously the core of the whole framework, as it is the only way to gain knowledge from the log.
Information can be stored in the Analyzer's instance until the log file was processed completely.
\paragraph{Manipulating a log entry}
Every Analyzer can manipulate a log entry.
This can be adding new information, modifying existing information, or deleting information.
\paragraph{Consuming a log entry}
Every Analyzer can consume a log entry.
A consumed log entry is not passed down the analysis chain anymore.
This can be useful to filter verbose logs before computationally expensive operations.
\subsubsection{Result}
When all entries of a game log have been processed, the results of each analyzer are collected.
Each result is linked to the analyzer which has produced this artifact to avoid ambiguous data sets.
\subsubsection{Postprocessing \& Render}
When all game logs are processed, the whole result set is passed into the Postprocessing step.
This is the first step to compare multiple game logs, i.e. the results of the analyzed game logs, directly.
Postprocessing is a hard requirement for rendering the results, as at least a transformation into the desired format is absolutley necessary.
Rendering is not restricted to visualizations, artifacts of all kind can be produced.
A whole range from static plots and CSV exports to structured JSON data for interactive map visualizations is possible.
\subsubsection{Log parser}
Key to the framework described above is a component to import game log data, parse it, and prepare it to a common format for processing.
This needs to be adapted for each supported game.
It has to know where game logs are stored and how they can be accessed.
Configurable items like URLs allow e.g. different game servers.
The important step is the parsing of game logs from the formats used by the games (e.g. JSON, XML, plain text, database, …).
\subsection{Web Interface}
The web interface is rather straightforward:
Expert users prepare a set of analysis methods and bundle them with suitable rendering targets to an analysis suit.
Non-expert users select game logs for processing, choose a prepared analysis suit, and receive a rendered result once the analysis process has finished.
\subsection{CLI/API Interface}
Providing insight access to analysis and render classes, the CLI/API interface offers the most powerful way to explore the log data.
By implementing custom algorithms, expert users can cope with difficult input formats and special requirements.
\subsection{Architecture}
The API is a standalone microservice.
It is independent of both game servers and user interfaces.
Separation from game servers narrows the scope, and allows the usage with any kind of game.
Games without central server can provide a mocked server to supply logged data, while games with server can e.g. expose API endpoints with authentication and user management.
By acting like any normal client, the framework can avoid obstacles like CORS/XSS prevention.
The independence to user interfaces, mainly the web interface, allows scalability through load-balancing with mulitple API workers.
Expert users with special requirements can embed the framework in projects without pulling in large amounts of dependencies.
\begin{comment}
\subsection{old}
Game independance:
\begin{itemize}
\item log importer/transformer necessary
\end{itemize}
\begin{itemize}
\item Based on map-reduce
\item Map: Analysis
\begin{itemize}
\item Iterate Log entries
\item Feed log entry through analyzer queue
\begin{itemize}
\item Augment entries
\item Filter entries
\item Sequential order
\end{itemize}
\end{itemize}
\item Reduce: Collect summaries from analyzers
\begin{itemize}
\item Rendering
\item Post-processing, Comparison, …
\end{itemize}
\item standalone (indep. of any game)
\item own client for game server (due to CORS/XSS prevention prohibiting shared use of game server assets in other host)
\item API for integration
\item allow load distribution
\end{itemize}
\end{comment}