GraphDice: A System for Exploring Multivariate Social Networks

Original Paper:

This paper was very organized and included well planned research. They did a good job at laying down the ground work with the background information like the 3 categories of social network analysis software tools (confirmatory analysis, exploratory analysis and network visualization) and previous paper’s task and/or challenges with social network analysis. They concluded this section of the paper by stating that “visualizing multivariate networks is recognized as an important but difficult challenge,” and “[their] challenge is to provide a simple visualization tool supporting a more complete set of the SNA tasks”. The rest of the paper followed in its clarity. The preliminary user feedback section was particularly interesting to me. I thought it was good that they acknowledged that their full day workshop did not suffice as a user study and that they still plan to do one. The way they utilized the preliminary user feedback was pretty cool.

Force-Directed Edge Bundling for Graph Visualization

original paper:

This paper described how edge bundling removes clutter and shows edge patterns and how they did it without hierarchy or control mesh. The edges were flexible springs, and the nodes were fixed positions. The self organization technique using physics is very intuitive. They presented both inverse linear and inverse quadratic models in the paper. They also described how bundled edges can differ greatly in length. I liked the section on smoothing using a Gaussian kernel. Figures 7-9 were very clear and effective in expressing the results of their work and without these I may have been a little lost.

Tree visualization with Tree-maps: A 2-d space-filling approach

Original paper:

This paper presented a very clean over view of the space-filling approach to tree maps. I thought it was very clear and the inclusion of pseudo code was very effective. It felt like a final project report for cs 245 or something with the definitions, but that made it seem accessible, which in a paper can be very effective. Midway through the paper I preemptively wrote down that there was not limits of this tree map algorithm included. I knew that this function had a finite number of nodes that it could effectively show on the screen. However, the display resolution section addressed my concern and included the idea of possible zooming, which was cool. Overall, I liked this paper because it felt like something I could easily write about my own work in the future.

Assignment 2 – Data Exploration through R

After assembling the data putting it into R I used ggplot2 to answer the following questions:

1.Who had the highest number of home runs (HR)? Jesse Barfield.

At first I got a scatter plot with the individual names on the y-axis which would ideally work if there were few enough names, but then I did an anesthetically pleasing way (shown second) with the name of the hitter placed instead of the point. Ideally I would love to add the label to the point on the graph to the left, but with my beginner skills I found using the two visualizations together works the best in this case (because there was a clear winner).


2. Who had the maximum number of hits in 1986? Don Mattingly.

I used the same (not ideal) technique that I used to answer the first questions. Again, I understand that I was only able to do this because there was a clear winner.


3. Name the second most expensive team in the league?

The best way to answer this question would be to sum all the salaries for each team, but I couldn’t find an effective way to do this with my skills. I used the team data to plot the average salaries for each team (shown in the first two visualizations). So then I used the hitter data to plot the salary verses the teams and got the third visualization. I then created a box and whisker plot per team. Though I can’t say definitively which team spends in total (on all the players) the second most. However, I can answer many other questions like: Chicago has the second highest average salary (shown by the team visualizations shown first). What was weird to me is that this did not coincide with the data from the individual box plot visualizations created with the hitter data. From this data I can conclude that Baltimore spends the most on an individual player and Boston the second; New York middle 50% (inner quartile range) extends the highest, but doesn’t start the highest; The average bar shows that Boston is the highest average salary, and Toronto the second. I attempted to get a stacked bar graph with the sums per team, but I feel like I would be more successful if I created that information with the data itself and then visualized it instead of the other way around.




Specific Goal: Are players paid according to their performance?

To answer this question I created many difference visualizations and will describe them one by one. I did manipulate the data by creating new variables that represent the percentage of hits, runs, home runs, etc. by using the number of hits and dividing it by the number of at bats. Another variable I created was the career percentage where I divided the number of hits, runs, etc. divided by their yearly average of hits, runs, etc. (their career hits, runs, etc. by the number of years they have played in the majors). This variable will be 1 if they are getting about the same amount as they have averaged in their career past, it will be >1 if they are doing much better in the current year and <1 if they are doing much worse in the current year.


^ First of all I plotted a histogram of the salaries to see the distribution of the salaries, which is as expected with most of the salaries being on the smaller range.


^ Plotted a box and whisker plot on salaries per position to see if there was any particular position that was clearly paid more than the others. However, it looks as though players of different positions are on average paid the same. (If I was continuing the research in this data I would investigate the positions of players only making over say 1,000 to see if there is a pattern within the players who make the most.)


^Plotted salary against errors made using the number of years they’ve played in the majors for the color. The color shows that the those with higher salaries have played in the majors for a little while. The trend line added shows that errors didn’t seem to effect their salary.


^ Plotting salary against the number of assists surprisingly shows that assists don’t seem to effect their salary either.


^ Plotting run percentage (runs/at bats) against salary. The trend line shows that as the runs percentage increases so does the salary. This would support the idea that players get paid according to their performance. Outlier on top all seem to be a darker blue and thus are newer players, so their pay makes sense. If they are lighter then they are either not paid according to their hitting/runs skills or they are paid disproportionately to to their skills.


^ Plotting hits percentage (hits/at bats) against salary. The trend line also shows that the hits percentage increases so does the salary. This would support the idea that players get paid according to their performance. Outlier on top all seem to be a darker blue and thus are newer players, so their pay makes sense. If they are lighter then they are either not paid according to their hitting skills or they are paid disproportionately to to their skills.


^ Plotting career hits against salary. The trend line shows that with an increase in career hits so does the salary. This would support the idea that players get paid according to their performance. Another thing seen from this visualization the players who’ve been playing longer with very small amount of career hits do not seem to get paid much, with some exceptions. Outlier is the point on the bottom right, with high number of career hits and a low salary, this player could have gotten most of those hits earlier in their career and is thus not playing as well now so they don’t get paid enough. Other outliers on the top left are players with high salaries and a low number of career hits, these players are probably not paid for their hitting (probably a pitcher) or its their first year so their career hits is their current number of hits (all darker blue) or they are not paid accordingly.


^Plotting career home runs against salary. The trend line shows an increase in salary as career home runs increase. This would support the idea that players get paid according to their performance. The players that have more career home runs that fall under the trend line are all players who haven’t played in the majors for very long, which makes sense that he new players that are hitting well in the first few seasons don’t have high salary contracts yet. This would support the idea that players get paid according to their performance, when you consider that they are getting paid based on past seasons performance. Outliers are the same as previous visualization but with home runs instead of hits.

Overall, the data still calls for more investigation; however at this point I will conclude that players get paid according to their performance. (Players that are shown to have higher performance but with lower salaries are assumed to not be of higher performance in a different category. Example: A newer player to the league with amazing unexpected stats then when they made the salary contract or a player that has great stats but just had an injury and was back from the DL. And players that are shown to have lower performance but with higher salaries are assumed to be of higher performance in a different category. Example: an amazing pitcher that makes a lot of money but is not the best hitter, therefore their hitting stats will be very low, but will be making a lot of money.

Chapter 9: Arrange Networks and Trees

This chapter in the book discusses design choices for arranging network data with node-link diagrams or matrices. Overall the text provided a solid summary report on the different techniques. I found the part about finding cliques and clusters in both matrix and node-link views very helpful. I thought the most interesting section of this reading was the costs and benefits section which compared the two previously described arrangement techniques very well. To summarize:

Connection Strength:

  • for small networks, they are intuitive for supporting many abstract tasks:
    • those that rely on topological structure
    • utilize general overview
    • finding similar substructures

Connection Weakness:

  • after a certain size and or link density reading becomes impossible (“hairball” effect)

Matrix Strength:

  • great for large and dense networks (with high info densities)
  • eliminates the occlusion of connection node-link views
  • predictable
    • screen space easily predicted (unlike link-node view)
  • stable
    • adding an item causes small visual change (unlike link-node view)
    • supports geometric or semantic zooming
  • easily reorderable
  • ability to quickly estimate the number of nodes in a graph

Matrix Weakness:

  • unfamiliarity (training needed to easily interpret, unlike node-link view)
  • lack of support for investigating topological structure

Visualization Viewpoints – User Studies: Why, How, and When?

Original Paper:

I read this article before creating my user evaluation over the summer and found it really useful in preparing for an effective user evaluation. Their inclusion on examples of studies and what they did is helpful. The “Basics of User Study Design” blurb inserted in the paper was a very clear background that I think was necessary for the reader to have. The most interesting part of this paper to me was about what to do when things “go wrong” or in other words you get data that fails to reject your null hypothesis and that just because null results aren’t necessarily publishable, they are super informative and can help you further your work so you can eventually get something worth publishing. Overall, it is clear that good user studies can enhance the quality of your research.

The Challenge of Information Visualization Evaluation

Original Paper:

Overall, I felt like this paper, especially compared to the other two user evaluation papers we are reading this week, is pretty disorganized and not as carefully laid out. This paper did emphasize the fact that usability testing and controlled experiments are the basis of evaluation. I thought the part about taking into consideration the training of the users was interesting because when I was working on my user evaluation this idea definitely came up and it is pretty important whether you want to train the user and if so how much. I thought the section on learning from the examples of technology transfer was the most interesting.

Empirical Studies in Information Visualization: Seven Scenarios

Original Paper:

The biggest thing to notice about this paper is its thoroughness and the extensive amount of background work that went into it, which is clearly shown with table 1. They do a good job of presenting a “descriptive rather than prescriptive approach,” which they mentioned as a goal early on in the paper. However, because of this, the paper is kind of a boring read, even though it doe a good job at presenting a bunch of potentially helpful information. A bulk of the paper is describing the goals/outputs, example evaluation questions, and methods and examples of each of their seven evaluation scenarios. These scenarios are: Understanding Environments and Work Practices (UWP), Evaluating Visual Data Analysis and Reasoning (VDAR), Evaluating Communication Through Visualization (CTV), Evaluating Collaborative Data Analysis (CDA), Evaluating User Performance (UP), Evaluating User Experience (UE), Evaluating Visualization Algorithms (VA). I think the paper does a good job at explaining visualization evaluations and are encouraging to get people to reflect on their goals before choosing methods.