Wednesday, February 3, 2016

Chapter 2. Mass Communication Effects: How Society and Media Interact Study


Chapter Summary

With the rise of mass society and the rapid growth of the mass media starting in the nineteenth century, the public, media critics, and scholars have raised questions about the effects various media might have on society and individuals. These effects were viewed initially as being strong, direct, and relatively uniform on the population as a whole. After World War I, critics were concerned that media-oriented political campaigns could have powerful, direct effects on voters. This view, though still widespread, was largely discredited by voter studies conducted in the 1940s and 1950s. These studies found that the voters with the strongest political opinions were those most likely to pay attention to the campaign and hence were least likely to be affected by the campaign. More recently, research has expanded to move beyond looking just at the effects that media and media content have on individuals and society to examinations of how living in a world with all-pervasive media changes the nature of our interactions and culture.

Understanding the effects of media on individuals and society requires that we examine the messages being sent, the medium transmitting them, the owners of the media, and the audience members themselves. The effects can be cognitive, attitudinal, behavioral, and psychological.

Media effects can also be examined in terms of a number of theoretical approaches, including functional analysis, agenda setting, uses and gratifications, social learning, symbolic interactionism, spiral of silence, media logic, and cultivation analysis.

Our understanding of the relationship among politicians, the press, and the public has evolved over the past half-century. Recent studies have supported interactional approaches to understanding campaign effects, including the resonance and competitive models.

Many people claim that the media are biased toward one political view or another. Conservative critics argue that there is a liberal bias arising from the tendency of reporters to be more liberal than the public at large. The liberals’ counterargument is that the press has a conservative bias because most media outlets are owned by giant corporations that hold pro-business views. Finally, some critics argue that the media hold a combination of values that straddle the boundary between slightly left and right of center. The press in the United States began as partisan during the colonial period, but adopted a detached, factual, objective style in the 1830s to appeal to a broader audience.

Learning Objectives

Explain how new media tools such as YouTube have changed the political process in the United States.
Discuss the history and development of the theories of media effects.
Name and define the four types of effects the mass media can have.
Define and explain the usefulness of the following mass communication theories: functional analysis, agenda setting, uses and gratifications, social learning, symbolic interactionism, spiral of silence, media logic, and cultivation analysis.
Name and explain two ways in which political campaigns affect voters.
Explain the debate and evidence about bias in the news media.
Name and define each of Herbert Gans’ eight basic journalistic values.

Saturday, January 30, 2016

Scientific visualization

Scientific visualization

A scientific visualization of a simulation of a Rayleigh–Taylor instability caused by two mixing fluids.[1]
Surface rendering of Arabidopsis thaliana pollen grains with confocal microscope.

Scientific visualization (also spelled scientific visualisation) is an interdisciplinary branch of science. According to Friendly (2008), it is "primarily concerned with the visualization of three-dimensional phenomena (architectural, meteorological, medical, biological, etc.), where the emphasis is on realistic renderings of volumes, surfaces, illumination sources, and so forth, perhaps with a dynamic (time) component".[2] It is also considered a branch of computer science that is a subset of computer graphics. The purpose of scientific visualization is to graphically illustrate scientific data to enable scientists to understand, illustrate, and glean insight from their data.

Contents

1 History
2 Methods for visualizing two-dimensional data sets
3 Methods for visualizing three-dimensional data sets
4 Scientific visualization topics
4.1 Computer animation
4.2 Computer simulation
4.3 Information visualization
4.4 Interface technology and perception
4.5 Surface rendering
4.6 Volume rendering
4.7 Volume visualization
5 Scientific visualization applications
5.1 In the natural sciences
5.2 In geography and ecology
5.3 In mathematics
5.4 In the formal sciences
5.5 In the applied sciences
6 Scientific visualization organizations
7 See also
8 References
9 Further reading
10 External links

History
Charles Minard's flow map of Napoleon’s March.

One of the earliest examples of three-dimensional scientific visualisation was Maxwell's thermodynamic surface, sculpted in clay in 1874 by James Clerk Maxwell.[3] This prefigured modern scientific visualization techniques that use computer graphics.[4]

Notable early two-dimensional examples include the flow map of Napoleon’s March on Moscow produced by Charles Joseph Minard in 1869;[2] the “coxcombs” used by Florence Nightingale in 1857 as part of a campaign to improve sanitary conditions in the British army;[2] and the dot map used by John Snow in 1855 to visualise the Broad Street cholera outbreak.[2]
Methods for visualizing two-dimensional data sets

Scientific visualization using computer graphics gained in popularity as graphics matured. Primary applications were scalar fields and vector fields from computer simulations and also measured data. The primary methods for visualizing two-dimensional (2D) scalar fields are color mapping and drawing contour lines. 2D vector fields are visualized using glyphs and streamlines or line integral convolution methods. 2D tensor fields are often resolved to a vector field by using one of the two eigenvectors to represent the tensor each point in the field and then visualized using vector field visualization methods.
Methods for visualizing three-dimensional data sets

For 3D scalar fields the primary methods are volume rendering and isosurfaces. Methods for visualizing vector fields include glyphs (graphical icons) such as arrows, streamlines and streaklines, particle tracing, line integral convolution (LIC) and topological methods. Later, visualization techniques such as hyperstreamlines[5] were developed to visualize 2D and 3D tensor fields.
Scientific visualization topics
Maximum intensity projection (MIP) of a whole body PET scan.
Solar system image of the main asteroid belt and the Trojan asteroids.
Scientific visualization of Fluid Flow: Surface waves in water
Chemical imaging of a simultaneous release of SF6 and NH3.
Topographic scan of a glass surface by an Atomic force microscope.
Computer animation

Computer animation is the art, technique, and science of creating moving images via the use of computers. It is becoming more common to be created by means of 3D computer graphics, though 2D computer graphics are still widely used for stylistic, low bandwidth, and faster real-time rendering needs. Sometimes the target of the animation is the computer itself, but sometimes the target is another medium, such as film. It is also referred to as CGI (Computer-generated imagery or computer-generated imaging), especially when used in films.
Computer simulation

Computer simulation is a computer program, or network of computers, that attempts to simulate an abstract model of a particular system. Computer simulations have become a useful part of mathematical modelling of many natural systems in physics, and computational physics, chemistry and biology; human systems in economics, psychology, and social science; and in the process of engineering and new technology, to gain insight into the operation of those systems, or to observe their behavior.[6] The simultaneous visualization and simulation of a system is called visulation.

Computer simulations vary from computer programs that run a few minutes, to network-based groups of computers running for hours, to ongoing simulations that run for months. The scale of events being simulated by computer simulations has far exceeded anything possible (or perhaps even imaginable) using the traditional paper-and-pencil mathematical modeling: over 10 years ago, a desert-battle simulation, of one force invading another, involved the modeling of 66,239 tanks, trucks and other vehicles on simulated terrain around Kuwait, using multiple supercomputers in the DoD High Performance Computer Modernization Program.[7]
Information visualization

Information visualization is the study of "the visual representation of large-scale collections of non-numerical information, such as files and lines of code in software systems, library and bibliographic databases, networks of relations on the internet, and so forth".[2]

Information visualization focused on the creation of approaches for conveying abstract information in intuitive ways. Visual representations and interaction techniques take advantage of the human eye’s broad bandwidth pathway into the mind to allow users to see, explore, and understand large amounts of information at once.[8] The key difference between scientific visualization and information visualization is that information visualization is often applied to data that is not generated by scientific inquiry. Some examples are graphical representations of data for business, government, news and social media.
Interface technology and perception

Interface technology and perception shows how new interfaces and a better understanding of underlying perceptual issues create new opportunities for the scientific visualization community.[9]
Surface rendering

Rendering is the process of generating an image from a model, by means of computer programs. The model is a description of three-dimensional objects in a strictly defined language or data structure. It would contain geometry, viewpoint, texture, lighting, and shading information. The image is a digital image or raster graphics image. The term may be by analogy with an "artist's rendering" of a scene. 'Rendering' is also used to describe the process of calculating effects in a video editing file to produce final video output. Important rendering techniques are:

Scanline rendering and rasterisation
A high-level representation of an image necessarily contains elements in a different domain from pixels. These elements are referred to as primitives. In a schematic drawing, for instance, line segments and curves might be primitives. In a graphical user interface, windows and buttons might be the primitives. In 3D rendering, triangles and polygons in space might be primitives.

Ray casting
Ray casting is primarily used for realtime simulations, such as those used in 3D computer games and cartoon animations, where detail is not important, or where it is more efficient to manually fake the details in order to obtain better performance in the computational stage. This is usually the case when a large number of frames need to be animated. The resulting surfaces have a characteristic 'flat' appearance when no additional tricks are used, as if objects in the scene were all painted with matte finish.

Radiosity
Radiosity, also known as Global Illumination, is a method that attempts to simulate the way in which directly illuminated surfaces act as indirect light sources that illuminate other surfaces. This produces more realistic shading and seems to better capture the 'ambience' of an indoor scene. A classic example is the way that shadows 'hug' the corners of rooms.

Ray tracing
Ray tracing is an extension of the same technique developed in scanline rendering and ray casting. Like those, it handles complicated objects well, and the objects may be described mathematically. Unlike scanline and casting, ray tracing is almost always a Monte Carlo technique, that is one based on averaging a number of randomly generated samples from a model.

Volume rendering

Volume rendering is a technique used to display a 2D projection of a 3D discretely sampled data set. A typical 3D data set is a group of 2D slice images acquired by a CT or MRI scanner. Usually these are acquired in a regular pattern (e.g., one slice every millimeter) and usually have a regular number of image pixels in a regular pattern. This is an example of a regular volumetric grid, with each volume element, or voxel represented by a single value that is obtained by sampling the immediate area surrounding the voxel.
Volume visualization

According to Rosenblum (1994) "volume visualization examines a set of techniques that allows viewing an object without mathematically representing the other surface. Initially used in medical imaging, volume visualization has become an essential technique for many sciences, portraying phenomena become an essential technique such as clouds, water flows, and molecular and biological structure. Many volume visualization algorithms are computationally expensive and demand large data storage. Advances in hardware and software are generalizing volume visualization as well as real time performances".[9]
Scientific visualization applications

This section will give a series of examples how scientific visualization can be applied today.[10]
In the natural sciences

Star formation[11]

Gravitational waves[12]

Massive Star Supernovae Explosions

Molecular rendering

Star formation: The featured plot is a Volume plot of the logarithm of gas/dust density in an Enzo star and galaxy simulation. Regions of high density are white while less dense regions are more blue and also more transparent.

Gravitational waves: Researchers used the Globus Toolkit to harness the power of multiple supercomputers to simulate the gravitational effects of black-hole collisions.

Massive Star Supernovae Explosions: In the image, three-Dimensional Radiation Hydrodynamics Calculations of Massive Star Supernovae Explosions The DJEHUTY stellar evolution code was used to calculate the explosion of SN 1987A model in three dimensions.

Molecular rendering: VisIt's general plotting capabilities were used to create the molecular rendering shown in the featured visualization. The original data was taken from the Protein Data Bank and turned into a VTK file before rendering.
In geography and ecology

Terrain rendering

Climate visualization[13]

Atmospheric Anomaly in Times Square

Terrain visualization: VisIt can read several file formats common in the field of Geographic Information Systems (GIS), allowing one to plot raster data such as terrain data in visualizations. The featured image shows a plot of a DEM dataset containing mountainous areas near Dunsmuir, CA. Elevation lines are added to the plot to help delineate changes in elevation.

Tornado Simulation: This image was created from data generated by a tornado simulation calculated on NCSA's IBM p690 computing cluster. High-definition television animations of the storm produced at NCSA were included in an episode of the PBS television series NOVA called "Hunt for the Supertwister." The tornado is shown by spheres that are colored according to pressure; orange and blue tubes represent the rising and falling airflow around the tornado.

Climate visualization: This visualization depicts the carbon dioxide from various sources that are advected individually as tracers in the atmosphere model. Carbon dioxide from the ocean is shown as plumes during February 1900.

Atmospheric Anomaly in Times Square In the image the results from the SAMRAI simulation framework of an atmospheric anomaly in and around Times Square are visualized.
File:Tesseract.ogvPlay media
View of a 4D cube projected into 3D: orthogonal projection (left) and perspective projection (right).
In mathematics

Scientific visualization of mathematical structures has been undertaken for purposes of building intuition and for aiding the forming of mental models.[14]

Higher-dimensional objects can be visualized in form of projections (views) in lower dimensions. In particular, 4-dimensional objects are visualized by means of projection in three dimensions. The lower-dimensional projections of higher-dimensional objects can be used for purposes of virtual object manipulation, allowing 3D objects to be manipulated by operations performed in 2D,[15] and 4D objects by interactions performed in 3D.[16]
In the formal sciences

Curve plots

Image annotations

Scatter plot

Computer mapping of topographical surfaces: Through computer mapping of topographical surfaces, mathematicians can test theories of how materials will change when stressed. The imaging is part of the work on the NSF-funded Electronic Visualization Laboratory at the University of Illinois at Chicago.

Curve plots: VisIt can plot curves from data read from files and it can be used to extract and plot curve data from higher-dimensional datasets using lineout operators or queries. The curves in the featured image correspond to elevation data along lines drawn on DEM data and were created with the feature lineout capability. Lineout allows you to interactively draw a line, which specifies a path for data extraction. The resulting data was then plotted as curves.

Image annotations: The featured plot shows Leaf Area Index (LAI), a measure of global vegetative matter, from a NetCDF dataset. The primary plot is the large plot at the bottom, which shows the LAI for the whole world. The plots on top are actually annotations that contain images generated earlier. Image annotations can be used to include material that enhances a visualization such as auxiliary plots, images of experimental data, project logos, etc.

Scatter plot: VisIt's Scatter plot allows to visualize multivariate data of up to four dimensions. The Scatter plot takes multiple scalar variables and uses them for different axes in phase space. The different variables are combined to form coordinates in the phase space and they are displayed using glyphs and colored using another scalar variable.
In the applied sciences

Porsche 911 model

YF-17 aircraft Plot

City rendering

Porsche 911 model (NASTRAN model): The featured plot contains a Mesh plot of a Porsche 911 model imported from a NASTRAN bulk data file. VisIt can read a limited subset of NASTRAN bulk data files, in general enough to import model geometry for visualization.

YF-17 aircraft Plot: The featured image displays plots of a CGNS dataset representing a YF-17 jet aircraft. The dataset consists of an unstructured grid with solution. The image was created by using a pseudocolor plot of the dataset's Mach variable, a Mesh plot of the grid, and Vector plot of a slice through the Velocity field.

City rendering: An ESRI shapefile containing a polygonal description of the building footprints was read in and then the polygons were resampled onto a rectilinear grid, which was extruded into the featured cityscape.

Inbound traffic measured: This image is a visualization study of inbound traffic measured in billions of bytes on the NSFNET T1 backbone for the month of September 1991. The traffic volume range is depicted from purple (zero bytes) to white (100 billion bytes). It represents data collected by Merit Network, Inc.[17]
Scientific visualization organizations

Important laboratory in the field are:

Electronic Visualization Laboratory
NASA Goddard Scientific Visualization Studio.[18]

Conferences in this field, ranked by significance in scientific visualization research, are:

IEEE Visualization
EuroVis
SIGGRAPH
Eurographics
Graphicon

See further: Category:Computer graphics organizations
See also
Portal icon Computer science portal
Portal icon Science portal

General

ACM Transactions on Graphics
Data Presentation Architecture
Data visualization
Mathematical visualization
Molecular graphics
Skin friction line
Tensor glyph
Visulation
Visual analytics

People

Tristan Needham

Software

Avizo
Baudline
Bitplane
Datacopia
Dataplot
DataMelt
DeDaLo
MeVisLab
NCAR Command Language
Orange
ParaView
Sirius visualization software
Tecplot
tomviz
VAPOR
Vis5D
VisAD
VisIt
VTK
Category:Free data visualization software

References

Visualizations that have been created with VisIt. at wci.llnl.gov. Updated: November 8, 2007
Michael Friendly (2008). "Milestones in the history of thematic cartography, statistical graphics, and data visualization".
James Clerk Maxwell and P. M. Harman (2002), The Scientific Letters and Papers of James Clerk Maxwell, Volume 3; 1874–1879, Cambridge University Press, ISBN 0-521-25627-5, p. 148.
Thomas G.West (February 1999). "James Clerk Maxwell, Working in Wet Clay". SIGGRAPH Computer Graphics Newsletter 33 (1): 15–17. doi:10.1145/563666.563671.
Delmarcelle, T; Hesselink, L. (1993). "Visualizing second-order tensor fields with hyperstreamlines". Computer Graphics and Applications , IEEE 13 (4).
Steven Strogatz (2007). "The End of Insight". In: What is your dangerous idea? John Brockman (ed). HarperCollins.
"Researchers stage largest military simulation ever". (news), Jet Propulsion Laboratory, Caltech, December 1997.
James J. Thomas and Kristin A. Cook (Ed.) (2005). Illuminating the Path: The R&D Agenda for Visual Analytics. National Visualization and Analytics Center. p.30
Lawrence J. Rosenblum (ed.) (1994). Scientific Visualization: Advances and challenges. Academic Press.
All examples both images and text here, unless another source is given, are from the Lawrence Livermore National Laboratory (LLNL), from the LLNL website, Retrieved 10–11 July 2008.
The data used to make this image were provided by Tom Abel Ph.D. and Matthew Turk of the Kavli Institute for Particle Astrophysics and Cosmology.
BLACK-HOLE COLLISIONS The Globus software creators Ian Foster, Carl Kesselman and Steve Tuecke. Publication Summer 2002.
Image courtesy of Forrest Hoffman and Jamison Daniel of Oak Ridge National Laboratory
Andrew J. Hanson, Tamara Munzner, George Francis: Interactive methods for visualizable geometry, Computer, vol. 27, no. 7, pp. 73–83 (abstract)
A. J. Hanson: Constrained 3D navigation with 2D controller, Visualization '97., Proceedings, 24 October 1997, pp. 175-182 (abstract)
Hui Zhang, Andrew J. Hanson: Shadow-Driven 4D Haptic Visualization, IEEE Transactions on Visualization and Computer Graphics, vol. 13, no. 6, pp. 1688-1695 (abstract)
Image by Donna Cox and Robert Patterson. The National Science Foundation Press Release 08-112.

NASA Goddard Scientific Visualization Studio

Further reading

Bruce H. McCormick, Thomas A. DeFanti and Maxine D. Brown (eds.) (1987). Visualization in Scientific Computing. ACM Press.
Gregory M. Nielson, Hans Hagen and Heinrich Müller (1997). Scientific Visualization: Overviews, Methodologies, and Techniques. IEEE Computer Society.
Clifford A. Pickover (ed.) (1994). Frontiers of Scientific Visualization. New York: John Willey Inc.
Lawrence J. Rosenblum (ed.) (1994). Scientific Visualization: Advances and challenges. Academic Press.
Will Schroeder, Ken Martin, Bill Lorensen (2003). The Visualization Toolkit. Kitware, Inc.
Leland Wilkinson (2005). The Grammar of Graphics, Springer.
Paolo Ciuccarelli, Giorgia Lupi, Luca Simeone (2014) Visualizing the Data City: Social Media as a Source of Knowledge for Urban Planning and Management"', Springer

External links
Wikimedia Commons has media related to Scientific visualization.

National Institute of Standards and Technology Scientific Visualizations, with an overview of applications.
Scientific Visualization Tutorials, Georgia Tech
NASA Scientific Visualization Studio. They facilitate scientific inquiry and outreach within NASA programs through visualization.
scienceviz.com - Scientific Vizualisation, Simulation and CG Animation for Universities, Architects and Engineers

Visualizing Data Mining Models

Visualizing Data Mining Models

1. Introduction

The point of data visualization is to let the user understand what is going on. Since data mining usually involves extracting "hidden" information from a database, this understanding process can get somewhat complicated. In most standard database operations nearly everything the user sees is something that they knew existed in the database already. A report showing the breakdown of sales by product and region is straightforward for the user to understand because they intuitively know that this kind of information already exists in the database. If the company sells different products in different regions of the county, there is no problem translating a display of this information into a relevant understanding of the business process.

Data mining, on the other hand, extracts information from a database that the user did not already know about. Useful relationships between variables that are non-intuitive are the jewels that data mining hopes to locate. Since the user does not know beforehand what the data mining process has discovered, it is a much bigger leap to take the output of the system and translate it into an actionable solution to a business problem. Since there are usually many ways to graphically represent a model, the visualizations that are used should be chosen to maximize the value to the viewer. This requires that we understand the viewer's needs and design the visualization with that end-user in mind. If we assume that the viewer is an expert in the subject area but not data modeling, we must translate the model into a more natural representation for them. For this purpose we suggest the use of orienteering principles as a template for our visualizations.

1.1 Orienteering

Orienteering is typically accomplished by two chief approaches: maps and landmarks. Imagine yourself set down in an unknown city with instructions to find a given hotel. The usual method is to obtain a map showing the large-scale areas of the city. Once the "hotel district" is located we will then walk along looking for landmarks such as street names until we arrive at our location. If the landmarks do not match the map, we will re-consult the map and even replace one map with another. If the landmarks do not appear correct then usually one will backtrack, try a short side journey, or ask for further landmarks from people on the street. The degree to which we will follow the landmark chain or trust the map depends upon the match between the landmarks and the map. It will be reinforced by unexpected matches (happening along a unique landmark for which we were not looking), by finding the landmark by two different routes and by noting that variations are small. Additionally, our experience with cities and maps and the urgency of our journey will affect our confidence as well.

The combination of a global coordinate system (the map analogy) and the local coordinate system (the landmarks) must fit together and must instill confidence as the journey is traversed. The concept of a manifold is relevant in that the global coordinates must be realizable, as a combination of local coordinate systems is some sense. To grow trust in the user we should:

Show that nearby paths (small distances in the model) do not lead to widely different ends
Show, on demand, the effect that different perspectives (change of variables or inclusion probabilities) have on model structure
Make dynamic changes in coloring, shading, edge definition and viewpoint (dynamic dithering)
Sprinkle known relationships (landmarks) throughout the model landscape.
Allow interaction that provides more detail and answers queries on demand.

The advantages of this manifold approach include the ability to explore it in some optimal way (such as projection pursuit), the ability to reduce the models to a independent coordinate set, and the ability to measure model adequacy in a more natural manner.

1.2 Why Visualize a Data Mining Model?

The driving forces behind visualizing data mining models can be broken down into two key areas: Understanding and Trust. Understanding is undoubtedly the most fundamental motivation behind visualizing the model. Although the simplest way to deal with a data mining model is to leave the output in the form of a black box, the user will not necessarily gain an understanding of the underlying behavior in which they are interested. If they take the black box model and score a database, they can get a list of customers to target (send them a catalog, increase their credit limit, etc.). There’s not much for the user to do other than sit back and watch the envelopes go out. This can be a very effective approach. Mailing costs can often be reduced by an order of magnitude without significantly reducing the response rate.

The more interesting way to use a data mining model is to get the user to actually understand what is going on so that they can take action directly. Visualizing a model should allow a user to discuss and explain the logic behind the model with colleagues, customers, and other users. Getting buy-in on the logic or rationale is part of building the users’ trust in the results. For example, if the user is responsible for ordering a print advertising campaign, understanding customer demographics is critical. Decisions about where to put advertising dollars are a direct result of understanding data mining models of customer behavior. There’s no automated way to do this. It’s all in the marketing manager’s head. Unless the output of the data mining system can be understood qualitatively, it won’t be of any use. In addition, the model needs to be understood so that the actions that are taken as a result can be justified to others.

Understanding means more than just comprehension; it also involves context. If the user can understand what has been discovered in the context of their business issues, they will trust it and put it into use. There are two parts to this problem: 1) visualization of the data mining output in a meaningful way, and 2) allowing the user to interact with the visualization so that simple questions can be answered. Creative solutions to the first part have recently been incorporated into a number of commercial data mining products (such as MineSet [1]). Graphing lift, response, and (probably most importantly) financial indicators (e.g., profit, cost, ROI) give the user a sense of context that can quickly ground the results in reality. After that, simple representations of the data mining results allow the user to see the data mining results. Graphically displaying a decision tree (CART, CHAID, and C4.5) can significantly change that way in which the data mining software is used. Some algorithms can pose more problems than others (e.g., neural networks) can but novel solutions are starting to appear.

It is the second part that has yet to be addressed fully. Interaction is, for many users, the Holy Grail of visualization in data mining. Manipulation of the data and viewing the results dynamically allows the user to get a feel for the dynamics and test whether something really counter-intuitive is going on. The interactivity helps achieve this and the easier this is to do the better. Seeing a decision tree is nice, but what they really want to do is drag-and-drop the best segments onto a map of the United States in order to see if there are sales regions that are neglected. The number of "what if" questions that can be asked is endless: How do the most likely customers break down by gender? What is the average balance for the predicted defaulters? What are the characteristics of mail order responders? The interaction will continue until the user understands what is going on with their customers. Users also often desire drill through so that they can see the actual data behind a model (or some piece of the model), although it is probably more a matter of perceptions rather than actual usefulness. Finally, integrating with other decision support tools (e.g., OLAP) will let users view the data mining results in a manner that they are already using for the purpose of understanding customer behavior. By incorporating interaction into the process, a user will be able to connect the data mining results with his or her customers.

2. Trusting the Model

Attributing the appropriate amount of trust to data mining models is essential to using them wisely. Good quantitative measures of "trust" must ultimately reflect the probability that the model’s predictions would match future test targets. However, due to the exploratory and large-scale nature of most data-mining tasks, fully articulating all of the probabilistic factors to do so would seem to be generally intractable. Thus, instead of focussing on trying to boil "trust" down to one probabilistic quantity, it is typically most useful to visualize along many dimensions some of the key factors that contribute to trust (and distrust) in ones models. Furthermore, since, as with any scientific model, one ultimately can only disprove a model, visualizing the limitations of the model is of prime importance. Indeed, one might best view the overall goal of "visualizing trust" as that of understanding the limitations of the model, as opposed to understanding the model itself.

Since data mining relies heavily on training data, it is important to understand the limitations that given data sets put on future application of the resulting model. One class of standard visualization tools involves probability density estimation and clustering over the training data. Especially interesting would be regions of state space that are uncommon in the training data yet do not violate known domain constraints. One would tend to trust a model less if it acts more confident when presented with uncommon data as future inputs. For time-series data, visualizing indicators of non-stationarity is also important.

2.1 Assessing Trust in a Model

Assessing model trustworthiness is typically much more straight-forward than the holy grail of model understanding per se — essentially because the former is largely deconstructive while the latter is constructive. For example, without a deep understanding of a given model, one can still use general domain knowledge to detect that it violates expected qualitative principles. A well-known example is that one would be concerned if ones model employed a (presumably spurious) statistic correlation between shoe size and IQ. Of course, there are still very significant challenges in declaring such knowledge as completely and consistently as possible.

Domain knowledge is also critical for outlier detection needed to clean data and avoid classic problems such as a juvenile crime committed by a 80-year-old "child". If a data mining model were build using the data in Figure 1, it is possible that outliers (most likely caused by incorrect data entry) will skew the resulting model (especially the zero-year-old children, which are more reasonable than eighty-year-old children). The common role of visualization here is mostly in terms of annotating model structures with domain knowledge that they violate.

Figure 1: Age (in months) vs. Days to Intake Decision for juvenile crime offenders, Maryland Department of Juvenile Services. Note the 80-year-old children on the right side of the graph.


Not all assessments of trust are negative in nature, however. In particular, one can also increase ones trust in a model if other reasonable models seem worse. In this sense, assessing trust is also closely related to model comparison. In particular, it is very useful to understand the sensitivity of model predictions and quality to changes in parameters and/or structure of the given model. There are many ways to visualize such sensitivity, often in terms of local and global (conditional) probability densities — with special interest in determining whether multiple modes of high probability exist for some parameters and combinations. Such relative measures of trust can be considerably less demanding to formulate than attempts at more absolute measures, but do place special demands on the visualization engine, which must support quick and non-disorientating navigation through neighboring regions in model space.

Statistical summaries of all sorts are also common and useful for gathering insights for assessing model trust. Pairwise scatter-plots and low-dimensional density estimates are especially common. Summaries can be particularly useful for comparing relative trust of two models, by allowing analysis to focus on subsets of features for which their interrelationships differ most significantly between two models.

It is often useful to combine summaries with interactive ability to drill-through to the actual data. Many forms of visual summary actually display multiple scales of data along the raw to abstract continuum, making visual drill-through a natural recursive operation. For example, compressing millions of samples into a time-series strip chart that is only 1000 pixels wide allows one to quickly see the global highest and lowest points across the entire time range, as well as the local high and low points occurring within each horizontal pixel.

Most useful are models that qualify their own trustworthiness to some degree, such as in quantifying the expected variance in the error of their predictions.

In practice, such models tend to be relatively rare. Heavy emphasis on expected case rather than worst case performance is generally not all that inappropriate, since one is typically ultimately interested in concepts such as expected cumulative payoff.

There are important classes of tasks, such as novelty detection (e.g. fraud detection), for which quantified variance is essential. Standard techniques are learning confidence intervals (e.g. error bars for neural networks) and general probability density estimation. A promising recent approach [2], called bounds estimation, attempts to find a balance between the complexity of general probability density estimation and the simplicity of the mean estimation plus variance estimation approach to error bars.

Finally, it is important, though rather rare in practice to date, to consider many transformations of the data during visual exploration of model sensitivities. For example, a model that robustly predicts well the internal pressure of some engineering device should probably also be able to do well predicting related quantities, such as its derivative, its power spectrum, and other relevant quantities (such as nearby or redundant pressures). Checking for such internal consistency is perhaps ultimately one of the most important ways to judge the trustworthiness of a model, beyond standard cross validation error. Automated and interactive means of exploring and visualizing the space (and degrees) of inconsistencies a model entails seems to be a particularly important direction for future research on assessing model trustworthiness.

3. Understanding the Model

A model that can be understood is a model that can be trusted. While statistical methods build some trust in a model by assessing its accuracy, they cannot assess the model’s semantic validity — its applicability to the real world.

A data mining algorithm that uses a human-understandable model can be checked easily by domain experts, providing much needed semantic validity to the model. Unfortunately, users are often forced to trade off accuracy of a model for understandability.

Advanced visualization techniques can greatly expand the range of models that can be understood by domain experts, thereby easing the accuracy/understandability trade-off. Three components are essential for understanding a model: representation, interaction, and integration. Representation refers to the visual form in which the model appears. A good representation displays the model in terms of visual components that are already familiar to the user. Interaction refers to the ability to see the model in action in real time, to let the user play with the model as if it were a machine. Integration refers to the ability to display relationships between the model and alternate views of the data on which it is based. Integration provides the user context.

The rest of this section will focus on understanding classification models. Specifically, we will examine three models built using Silicon Graphic’s MineSet: decision tree, simple Bayesian, and decision table classifiers [3]. Each of these tools provides a unique form of understanding based on representation, interaction, and integration.

The graphical representation should be simple enough to be easily understood, but complete enough to reveal all the information present in the model. This is a difficult balance since simplicity usually trades off against completeness. Three-dimensional visualizations have the potential to show far more information than two-dimensional visualizations while retaining their simplicity. Navigation in such a scene lets one focus on an element of interest while keeping the rest of the structure in context. It is critical, however, that the user be able to navigate through a three-dimensional visualization in real time. An image of a three-dimensional scene is merely a two-dimensional projection and is usually more difficult to understand than a scene built in two dimensions.

Even with three dimensions, many models still contain far too much information to display simply. In these cases the visualization must simplify the representation as it is displayed. The MineSet decision tree and decision table visualizers use the principle of hierarchical simplification to present a large amount of information to the user.

Decision trees are easy to understand but can become overwhelmingly large when automatically induced. The SGI MineSet Tree Visualizer uses a detail-hiding approach to simplify the visualization. In figure 2, only the first few levels of the tree are initially displayed, despite the fact that the tree is extensive. The user can gain a basic understanding of the tree by following the branches of these levels. Additional levels of detail are revealed only when the user navigates to a deeper level, providing more information only as needed.



Figure 2: The MineSet Tree Visualizer shows only the portion of the model close to the viewer.


Using decision tables as a model representation generates a simple but large model. A full decision table theoretically contains the entire dataset, which may be very large. Therefore simplification is essential. The MineSet decision table arranges the model into levels based on the importance of each feature in the table. The data is automatically aggregated to provide a summary using only the most important features. When the user desires more information, he can drill down as many levels as needed to answer his question. The visualization automatically changes the aggregation of the data to display the desired level of detail. In figure 3, a decision table shows the well-known correlation between head shape and body shape in the monk dataset. It also shows that the classification is ambiguous in cases where head shape does not equal body shape. For these cases, the user can drill down to see that the attribute jacket color determines the class.



Figure 3: The MineSet Decision Table Visualizer shows additional pairs of attributes as the user drills down into the model.


While a good representation can greatly aid the user’s understanding, in many cases the model contains too much information to provide a representation that is both complete and understandable. In these cases we exploit the brain’s ability to reason about cause and effect and let the user interact with the more complex model. Interaction can be thought of as "understanding by doing" as opposed to "understanding by seeing".

Common forms of interaction are interactive classification, interactive model building, drill-up, drill-down, animation, searching, filtering, and level-of-detail manipulation. The fundamental techniques of searching, filtering, drill-up, and drill-down, make the task of finding information hidden within a complex model easier. However, they do not help overall understanding much. More extensive techniques (interactive classification, interactive model building) are required to help the user understand a model which is too complicated to show with a static image or table. These advanced methods aid understanding by visually showing the answer to a user query while maintaining a simplified representation of the model for context.

The MineSet Evidence Visualizer allows the user to interact with a simple Bayesian classifier (Figure 4). Even simple Bayesian models are based on multiplying arrays of probabilities that are difficult to understand by themselves. However, by allowing the user to select values for features and see the effects, the visualization provides cause-and-effect insight into the operation of the classifier. The user can play with the model to understand exactly how much each feature affects the classification and ultimately decide to accept or reject the result. In the example in the figure, the user selects the value of "working class" to be "self-employed-incorporated," and the value of "education" to be "professional-school". The pie chart on the right displays the expected distribution of incomes for people with these characteristics.



Figure 4: Specific attribute values are selected in the Evidence Visualizer in order to predict income for people with those characteristics.


Beyond interactive classification, interactively guiding the model-building process provides additional control and understanding to the user. Angoss [4] provides a decision tree tool that gives the user full control over when and how the tree is built. The user may suggest splits, perform pruning, or manually construct sections of the tree. This facility can boost understanding greatly. Figure 5a shows a decision tree split on a car’s brand attribute. While the default behavior of the tree is to form a separate branch on the tree for each categorical value, a better approach is often to group similar values together and produces only a few branches. The result shown in figure 5b is easier to understand and can sometimes give better accuracy. Interactive models allow the user to make changes like this as the situation warrants.





Figures 5a and 5b: A decision tree having branches for every value of the brand attribute (top), and a decision tree which groups values of brand to produce a simpler structure (bottom).


Interactive techniques and simplified representations can produce models that can be understood within their own context. However, for a user to truly understand a model, he must understand how the model relates to the data from which it was derived. For this goal, tool integration is essential.

Few tools on the market today use integration techniques. The techniques that are used come in three forms: drill-through, brushing, and coordinated visualizations. Drill-through refers to the ability to select a piece of a model and gain access to the original data upon which that piece of the model was derived. For example, the decision tree visualizer allows selection and drill-through on individual branches of the tree. This will provide access to the original data that was used to construct those branches, leaving out the data represented by other parts of the tree. Brushing refers to the ability to select pieces of a model and have the selections appear in an alternate representation. Coordinated visualizations generalize both techniques by showing multiple representations of the same model, combined with representations of the original data. Interactive actions that affect the model also affect the other visualizations. All three of these techniques help the user understand how the model relates to the original data. This provides an external context for the model and helps establish semantic validity.

As data mining becomes more extensive in industry and as the number of automated techniques employed increases, there is a natural tendency for models to become increasingly complex. In order to prevent these models from becoming mysterious oracles, whose dictates must be accepted on faith, it is essential to develop more sophisticated visualization techniques to keep pace with the increasing model complexity. Otherwise there is a danger that we will make decisions without understanding the reasoning behind them.

4. Comparing Different Models using Visualization

Model comparison requires the creation of an appropriate metric for the space of models under consideration. To visualize the model comparison, these metrics must be interpretable by a human observer through his or her visual system. The first step is to create a mapping from input to output of the modeling process. The second step is to map this process to the human visual space.

4.1 Different Meanings of the Word "Model"

It is important to recognize that the word "model" can have several levels of meaning. Common usage often associates the word model with the data modeling process. For example, we might talk of applying a neural network model to a particular problem. In this case, the word model refers to the generic type of model known as a neural network. Another use of the word model is associated with the end result of the modeling process. In the neural network example, the model could be the specific set of weights, topology, and node types that produces an output given a set of inputs. In still another use, the word model refers to the input-output mapping associated with a "black-box." Such a mapping necessarily places emphasis on careful identification of the input and output spaces.

4.2 Comparing Models as Input-Output Mappings

The input-output approach to model comparison simply considers the mapping from a defined input space to a defined output space. For example, we might consider a specific 1-gigabyte database with twenty-five variables (columns). The input space is simply the Cartesian product of the database's twenty-five variables. Any actions inside the model, such as creation of new variables, are hidden in the "black-box" and are not interpreted. At the end of the modeling process, an output is generated. This output could be a number, a prioritized list or even a set of rules about the system. The crucial issue is that we can define the output space in some consistent manner to derive an input to output mapping.

It is the space generated by the mappings that is of primary importance to the model comparison. For most applications the mapping space will be well defined once the input and output spaces are well defined. For example, two classifiers could be described by a set of input/output pairs, such as (obs1, class a), (obs2, class b), etc. The comparison metric could then be defined on these pairs as a count of the number differing, or GINI indices, or classification cost, etc. The resulting set of pairs could be visualized by simple plotting of points on a two-dimensional graph. The two model could be indexed by coloring or symbol codes. Or one could focus on the difference between each model directly and plot this. This approach should prove adequate so long as we restrict attention to a well-defined input-output structure.

4.3 Comparing Models as Algorithms

In the view of a model as static algorithm, again there seems to be a reasonable way to approach the model comparison problem. For example, a neural network model and an adaptive nonlinear regression model might be compared. These models would be expressed as a series of algorithmic steps. Each model's algorithm could then be analyzed by standard methods for measurement of algorithmic performance such as complexity, the finite word length and the stability of the algorithm. The investigator could also include measures on the physical implementation of the algorithm such as computation time, or computation size. Using these metrics the visualization could take the form of bar charts across the metrics. Again, different models could be encoded by color or symbol, and a graph of only difference between the two models on each metric could be provided. Each comparison would be for a static snapshot but certainly dynamic behavior could be exploited through a series of snapshots, i.e. a motion picture.

4.4 Comparing Models as Processes

The view of the model as a process is the most ill defined and therefore most intractable of the three views, but this should not minimize its importance. Indeed its sheer complexity might make it the most important view for the application of visualization. It is precisely in this arena that we encounter the subject area expert for whom these systems should offer the most benefit (such as confidence and trust).

The modeling process includes everything in and around the modeling activity, such as the methods, the users, the database, the support resources, and constraints such as knowledge, time and analysis implementation. Clearly this scope is too large for us to consider. Let us narrow our scope by assuming that the model comparison is being applied for one user on one database over a short time period. This implies that user differences, database differences, and knowledge difference can be neglected. We are left with analysis methods and implementation issues. For most subject area experts the implementation and the analysis are not separable, and so we will make the additional assumption that this issue can be ignored as well. With these simplifying assumptions we are essentially defining model comparison to be the comparison of modeling method and implementation simultaneously.

Imagine two models that are available in some concrete implemented form. These could be different general methods such as neural networks versus tree-based classifiers, or they could be different levels of sophistication within a class of models such as CART versus CHAID tree-structures. Remember that we are now focusing only on the modeling process, and not its input/output or algorithmic structure. It seems that reasonable metrics can be defined in this situation. For example, the running time could be such a metric, or the interpretability of instructions, or the number of tuning parameters that must be chosen by the user at run-time. The key here is that these metrics must be tailored to the user who is the target of the application. Thus, whereas the input-output view focused on these the spaces, and the algorithmic view focused on the properties of the algorithm independently of the user, now we must focus in great detail on the user’s needs and perceptions.

Once a set of metrics are chosen, we appear to be in a similar situation to that described under the algorithmic comparison. We should be able to show the distances between models in each of the defined metrics in a bar chart or other standard display. Color or symbol coding can be used to show the results from each model on the same chart as well.

There will be many possible metrics for the model-building process, at least one per user. Since it is unlikely we can choose a set of "one-size-fits-all" metrics, it is more useful to establish properties of good metrics and create methods to establish them in novel situations. The metrics chosen by a academic researcher would likely be very different from those chosen business user. Some properties that good metrics for the modeling process should be:

That they are expressed in direct risk/benefit to user.
That they evaluate their sensitivity to model input and assumptions.
That they can be audited (open to questioning at any point).
That they are dynamic.
That they can be summarized in the sense of an overall map.
That they allow reference to landmarks and markers.

Some aspects of the visualization process will take on added importance. One such aspect is the sequential behavior of the modeling process. For example, it is common to plot frequently the updated fit between the data and the model predictions as a neural network learns. A human being will probably give more trust to a method which mimics his or her own learning behavior (i.e., a learning curve which starts with a few isolated details, then grows quickly to broad generalizations and then makes only incremental gains after that in the typical "S" shape). Unstable behavior or large swings should count against the modeling process.

Another aspect of importance should a visual track of the sensitivity of the modeling process to small changes in the data and modeling process parameters. For example, one might make several random starts with different random weights in a neural network model. These should be plotted versus one another showing their convergence patterns, again perhaps against a theoretical S-shaped convergence.

The model must also be auditable, meaning that inquiries may be made at any reasonable place in the modeling process. For a neural network we should be able to interrupt it and examine individual weights at any step in the modeling process. Likewise for a tree-based model we should be able to see subtrees at will. Ideally there could be several scales in which this interruption could occur.

Since most humans operate on a system of local and global coordinates it will be important to be able to supplement the visualizations with markers and a general map structure. For example, even though the direct comparison is between two neural nets with different structures, it would be good to have the same distances plotted for another method with which the user is familiar (like discriminant analysis) even if that method is inadequate. If the same model could be used on a known input, the user could establish trust with the new results. It might also be useful to have simultaneously a detailed and a summarized model displayed. For example, the full tree-based classifier might have twenty-five branches, but the summarized tree might show the broad limbs only. And if the output is a rule it might be useful to drive (through logical manipulation) other results or statements of results as a test of reasonableness.

5. Conclusion

In this paper we have discussed a number of methods to visualize data mining models. Because data mining models typically generate results that were previously unknown to the user, it is important that any model visualization provide the user with sufficient levels of understanding and trust.


References

C. Brunk, J. Kelly, and R. Kohavi, "MineSet: An Integrated System for Data Access, Visual Data Mining, and Analytical Data Mining," Proceedings of the Third Conference on Knowledge Discovery and Data Mining (KDD-97), Newport Beach, CA, August 1997. See also http://www.sgi.com/Products/software/MineSet

D. DeCoste, "Mining multivariate time-series sensor data to discover behavior envelopes," Proceedings of the Third Conference on Knowledge Discovery and Data Mining (KDD-97), Newport Beach, CA, August 1997.

[3] D. Rathjens, MineSet Users Guide, Silicon Graphics, Inc., 1997.

[4] See http://www.angoss.com.

Thursday, January 28, 2016

Data visualisation

Data visualisation is an integral part of data analysis and business intelligence. Explore the most recommended type of charts and good design tips to help you create powerful and persuasive graphs for decision making.
Data visualisation is the graphical display of abstract information for two purposes: sense-making (also called data analysis) and communication.
Few (2013)

Most modern organisations use numerical data to communicate quantitative information. These numbers are fundamental to understanding of organisational performance. This information can be presented in many different ways, for example graphs, maps, and at a more advanced level – dashboards.


Despite the popular wisdom, data and numbers cannot always speak for themselves. Sometimes, too much time can be spent on struggling to understand the data presented in lengthy reports and numerical tables. This time could be better spent on making evidence-based decisions.

Data visualisation can help with the analysis of that information and present it in a way that allows viewers to discover patterns that might otherwise be hard to uncover. Large amounts of data are hard to wade through, but data visualisation can make that data easily digestible.

Who is this resource for?
This guide will be of benefit to anyone interested in creating well designed, informative and easy to understand charts. Whether you are a student, researcher, lecturer or work in management it is likely that you will often need to include statistical information or analysis in your papers, reports and presentations.

Even if you are already well experienced and have progressed to building interactive web-based dashboards, you may still find it beneficial to refresh your understanding of current practice relating to good visualisation design. After all even the most advanced dashboards will contain a collection of individual graphs, maps or other types of visual displays such as traffic lights and speed dials, for example.

During the past two decades, we have seen an amazing progress in technologies enabling us to collect and process huge amounts of data. This vast data availability has driven the interest in data analysis and visualisation. This in turn has lead to visualisation methods being constantly updated and developed as new evidence about the effectiveness of visualisation methods emerge.

This guide is not intended to be an exhaustive guide to the subject in hand. There are too many good sources of information eg specialist books, blogs and publications dealing with the topic of data visualisation for us to recreate it all. Even amongst the experts the opinions vary on what should be the gold standard and best practice in this area. Instead, our guide intends to be a distillation of these opinions and advice – many of which have been tried and tested by us in practice – and to bring many useful resources together into one place.

The advice contained in this resource is applicable to data visualisation used in the business context, rather than the data art so commonly seen in the media and conference presentations. In the context of this resource, data art is visualisation of data that seeks primarily to entertain or produce an aesthetic experience.

Business intelligence guide
Organisations require access to accurate, timely and meaningful information about their core businesses and the environment in which they operate, if they are to adapt and thrive during times of great uncertainty.

Our guide on business intelligence helps to explore this essential element of decision-making based on accurate data about the state of your organisation and the environment in which it operates.

Benefits of data visualization

Benefits of data visualization
1. Data visualization is a complex set of processeswhich is like an umbrella that covers bothinformation and scientific visualizationsimultaneously. We can’t ignore the benefits of datavisualization for its accurate quantities, as it is easilycomparable. It also lends valuable suggestionpertaining to the usage of its technique and tools.Scientifically its effectiveness lies in our brainsability to maintain a proper balance betweenperception and cognition through visualization.
2. With the sudden increase of thousands ofcompanies with their product, theresponsibilities of data visualization is everon the increase as an essentialcomponent of business intelligence. This iswhy companies are hiring expertdesigners having visualization skills. Thesignificant messages in data arepresented in the patterns and its trends,gaps and outliers.
3. This is the most interesting part for which we aredragged into it and grip it firmly forcomprehending it more quickly rather than rawnumbers alone. Visualization is so powerful andeffective that it can change someone’s mind in aflash.One of the most important benefits ofvisualization is that it encompasses various dataset quickly, effectively and efficiently and makesit accessible to the interested viewers. Itmotivates us to a deep insight with quick access.
4. It gives us opportunity to approach huge dataand makes it easily comprehensible, be it thefield of entertainment, current affairs, financialissues or political affairs. It also builds in us adeep insight, prompting us to take a gooddecision and an immediate action if needed.This could be related to child education, peoplesuffering from health issues, market research ofa product, rainfall in a specific geographicalarea and many others.
5. Another scope of data visualization is that ithas emerged in the business world lately asgeo-spatial visualization.
6. The popularity of geo-spatial visualization hasoccurred due to lot of websites providing webservices, attracting visitor’s interest. This type ofbusiness needs to take advantage of locationspecific information which is already present inthe system in the form of customer‘s zip codeproviding better daily analysis experience. Thistype of visualization adds a new dimension tothe figures and helps in better understanding ofthe matter.
7. The leading benefit of Data visualization isthat it not only provides graphicalrepresentation of data but also allowschanging the form, omitting what is notrequired, and browsing deeper to getfurther details. This is a great eye catcherand attracts our attention better andprovides better communication. Thisprovides a great advantage over traditionalmethods. Visual analytics will provide greatbenefit to business houses.
8. With the help of it data can be viewed inmultiple ways effectively by dividing datafindings. It provides an additional sense to thedata by making patterns.Data visualization provides a perfect balancebetween visual appeal and practicality. It helpsto provide better efficiency in the presentedinformation. This form helps in quickunderstanding of data and reduces confusionand doubts.
9. In conclusion, I must say that we canutilize the full benefit of datavisualization if we pay requiredattention to it. Sometimes too manycolors create visual noise, where youcan’t measure properly. Hence the onlylimitation to taking advantage of datavisualization is unskilled eyes.

Category Archives: Data and Analytics

A slick chart, an interactive data-exploration interface or a KPI-based dashboard; all of these are data visualization products. They garner a lot of attention because they are a finished product, and look nice as well. However, for many companies engaged in data visualization, those final deliverables aren’t the most important benefit of data visualization. Instead, it’s the insights into the quality of their collected data that truly leads to success.

Data visualization provides 3 key insights into data:

Is the data complete?
Is the data valid?
Is the data well-organized?
Without knowing those 3 elements, data collection and business intelligence processes become much more expensive, labor intensive, and may end up abandoned when the data doesn’t demonstrate what is intended. Using the insights from data visualization, these projects can have a much higher likelihood of completion and success.

Insight into Data #1: Is the data complete?

The most straightforward insight that visualization can give you about your data is its completeness. With a few quick charts, areas where data is missing show up as gaps or blanks on the report (called the “Swiss Cheese” effect).

In addition to learning which specific data elements are missing, visualizations can show trends of missing data. Those trends can tell a story about the data collection process and provide insight into changes necessary in the way data is gathered.

A Data Completeness Example: After creating a visualization on a collection of survey data regarding movie-going habits, it’s clear than there are a significant number of blanks after question 14 on the survey. The visualization helps the survey company recognize that those specific records need to be abandoned, but also that the survey should be shortened to accommodate for “respondent fatigue”, the likely cause of the incompletions.

Insight into Data #2: Is the data valid?

The importance of visualization among data validation techniques has been discussed before. It’s clear, then, that visualization can play a pivotal role in understanding data’s validity. By executing a quick, preliminary visualization on collected data, trends that indicate problems in the complete data can be found.

A Data Validation Example: A collected dataset is designed to demonstrate the difference in male population statistics between Alaska and Florida. Examination of individual records and outliers show that the data is valid – there are a significantly higher percentage of males in Alaska than in Florida, this is expected. However, a visualization of the entire dataset shows that there are more males in Alaska than Florida. This is a red flag because, even with the gender ratio differences, Florida’s larger population means that it should have a higher total number of males.

A well-designed, preliminary visualization can give insight into the validity of collected data that is difficult, or even impossible, to gain with traditional methods.

Insight into Data #3: Is the data well-organized?

Poorly organized data can be the bane of the final step of a data collection or business intelligence process. Using data organization tools from the start can help streamline later steps of the process.

During collection, the data is often organized in a way that optimizes the gathering process. However, that same organizational scheme can be a problem when the time comes to act. The data visualization process serves to highlight the organizational challenges of your data and provides insights into how it might be done better.

A Data Organization Example: A client wishes to use their collected customer data to develop a customer profile that defines demographic breakouts of snack-food purchases indexed by time of day. Their data visualization partner asks them where that data is stored and it is discovered that the transactional data is stored separately from the customer profile information, and that data can only be intersected through yet another correlational dataset. While all the data is technically available, the data needs to be reorganized to be functional in decision making.

Data visualization isn’t just data organization and analysis tool; it can play a crucial role in the entire data gathering and management process. With a well-executed visualization, taking time to understand what is to be learned from the data and how the information will be gathered, companies are able to cut costs and eliminate the waste that comes from having to re-gather or re-organize their data.

To find out what your data has to say to you, contact Boost Labs to learn about creating a visualization to give you the insights your project needs to succeed.

Wednesday, January 27, 2016

Functions of the Bureaucracy

Functions of the Bureaucracy

America's bureaucracy performs three primary functions to help keep the governmental beehive buzzing along.

1. The bureaucracy implements the laws and policies made by elected officials.

These laws and policies need to be put into practice in specific situations and applied in all the contingencies of daily life. For example, a city council has decided that all dog owners must have their pets licensed and microchipped, but the city council members don't have the time to make sure that their decision is carried out. City workers, members of the city's bureaucracy, are the ones who answer questions and complaints about the law, help dog owners fill out the proper forms, decide when to waive the license fee, refer owners to veterinarians who can insert the microchips, work with the vets to hand out coupons for discounts on microchips, and enforce the law to make sure that all dog owners have their animals licensed and microchipped in a reasonable amount of time.

2. The bureaucracy provides necessary administrative functions, like conducting examinations, issuing permits and licenses, and collecting fees.

Essentially, it handles the paperwork of everyday government operations. Anyone who has a driver's license has come face-to-face with bureaucratic administration through the required written and behind-the-wheel exams, learning permits, fees at all stages, and finally applying for and receiving the driver's license itself.

3. The bureaucracy regulates various government activities.

In other words, it creates the rules and regulations that clarify how various laws work on a daily basis. For instance, the bureaucracy is responsible for writing rules and regulations for public schools, including curriculum standards, examination procedures, discipline methods, teacher training and licensing requirements, and administrative policies. Schoolchildren feel the effects of these regulations when they work on their assignments or take standardized tests. Teachers use them to design class work and assessments. Principals and school boards must follow them when applying for funding or setting policies for their own schools and districts.
The Face of Bureaucracy

The bureaucracy can seem harsh and faceless to many Americans, who often get fed up with its strict rules and time-consuming procedures, but in fact, most bureaucrats, people who work in the bureaucracy, are simply their neighbors and fellow citizens. Who are all these busy bee bureaucrats who implement, administer, and regulate citizens' interaction with the government? A few interesting facts will introduce us to them.