Thesis Project: Using Social Network Analysis to Evaluate Objects in Software System
Table of Contents
4.1.1 Degree Centrality…………………………………………………………………………….15
4.1.2 Betweenness Centrality……………………………………………………………………16
4.1.3 Closeness Centrality………………………………………………………………….…..16
4.1.4 Eigenvector Centrality……………………………………………………………….……16
4.2 Software System Centrality Measures……………………………………………………….……17
4.2.1 Convert Sequence/Communication Diagram to Graph…………………………………..17 4.2.2 Analyzing Interaction Diagram…………………………………………………………..18
4.2.3 Degree Centrality………………………………………………………………………….18
4.2.4 Betweenness Centrality…………………………………………………………………..18
4.2.5 Closeness Centrality……………………………………………………………………..19
4.2.6 Eigenvector Centrality………………………………………………………………19
5. Study Cases. ……………………………..…………………………………………….15
5.1 Demo ……………………………………………………………………………………..……20
5.2 Olin Shopping System …………………………………………………………………………..24
5.3 Jasper Report ………………………………………………………………………………….28
Analyzing object oriented software design is essential in software systems to avoid many problems that could occur in implementation phase. This design contains many objects that work together to achieve specific goal. Hence we need to evaluate these objects to identify the level of importance of each one. There are a few researches that focused on software system analysis in earlier phase to estimate components before implementation. However, there are several researches handled the same challenge in Social Network Analysis (SNA) and we can use some of those approach to analyze software system. The analysis mechanism that social network used depends on graphs and visualizations. In addition, there are many software tools and algorithm for exploratory analysis of graph data to identify the important nodes in the graph. The general idea for the project is to apply the approach which has been used in SNA to software system UML diagrams specifically for sequence/communication diagram. If we assume the classes in the diagram are nods and associations between classes are the edges we can achieve good results.
List of Figures
FIGURE 2: OVERVIEW OF PROJECT 7
FIGURE 3: OVERVIEW OF UML DIAGRAM 14
FIGURE 4: DEMO SEQUENCE DIAGRAM 20
FIGURE 5: Demo’s XMI File Shows Lifelines Tag 21
FIGURE 6: Demo’s XMI File Shows Message Tag 21
FIGURE 7: Demo’s Csv File Shows Nodes 22
FIGURE 7: Demo’s Csv File Shows Edges 22
FIGURE 8: Demo’s Graph 23
FIGURE 9: Olin Shopping System Communication Diagram 24
FIGURE 10: Olin Shopping System’s Nodes Csv File 25
FIGURE 11: Olin Shopping System’s Edges Csv File 25
FIGURE 12: Olin Shopping System Nodes at Gephi Software 26
FIGURE 13: Olin Shopping System Edges at Gephi Software 26
FIGURE 14: Olin Shopping System Graph 27
FIGURE 15: Jasper Report (BatchExportApp) Graph 29
TABLE 3: Centrality Metrics Measures for Olin Shopping System 28
TABLE 3: Centrality Metrics Measures for Jasper Report(BatchExportApp) System 30
TABLE 4: Analysis Result of Centrality Measures For Many Systems 32
Present day software systems industry is becoming larger and more complex than before. All objects in software system do not have the same degree of importance. Some objects play more important part than others in a system. The general idea of this project is to find out the most critical objects in software system source code. To do this, research on Social Network Analysis to identify the most important players in the network was done. Then those approach/methods/parameters (measures) were used in software design architecture for the same purpose of identifying the key players, which in this case are classes.
The project has provided knowledge about graphical analysis approach in social network. In order to understand this project, searching about clear technique was critical to know more about social network analysis and how it measures the importance of nodes in graph, and what the technology behind it is. Altova, Gephi, and Enterprise Architect are the softwares, which are used in this project. I learned from those softwares how draw UML diagrams –communication/ sequence diagram- and convert it to XML/XMI file and convert XML/XMI file to csv file which social network analysis software (Gephi) accept. To convert that file, XML parser java code was written (I learned that in my previous java classes). In addition, I get knowledge about how to use social network analysis software (gephi) and how I can grab data from Facebook or enter the data manually. Another aspect is to study different algorithms for identifying important player in the network but then we didn’t use it and used gephi instead. Many research papers had been read to get more knowledge about 1) software system analysis by using social network analysis approach and 2) how to convert UML communication diagram to graph. But there are no clear steps for those at this time. The knowledge based system and object oriented modeling courses provided the skills to be able to gather the requirements for this project. So the approach used was to get the output files from altova – The file was XML representation of a UML diagram from a project source code, convert that output file in a format that can be imported into an SNA software (Gephi) – A csv file that gephi takes in as input, and at last import that in gephi and get all SNA measures for the source code.
FIGURE : Context Diagram
Dr. Frank Xu, professor at Computer Science Department at Gannon University, is the stakeholder for the project. Several meetings were conducted with Dr. Xu were completed before identifying the initial scope of the project. The proposal project has notes and data had been collected from the stakeholder, and the requirements and Dr. Xu confirmed the scope for this project.
The main needs of this project are:
- To provide the software engineer and software developer with method that able to grab software system data directly from csv files.
- A clear assessment approach to analyze the importance of different components in a system.
The primary focus of this project is,
- To be able evaluate classes in communication/sequence diagram.
- Generate diagram from source code.
- Generate xml/xmi file from diagram.
- Convert a diagram to csv files.
- Generate SNA measures from the CSV files.
Summary of Capabilities
The project provides several benefits; all of them are listed below.
The major benefits of the project provide software system engineering and developer with a method to evaluate objects in architecture. The project will go through many steps to get the aim goal which are:
Table 1: Benefits and Supporting Features
This project had many stages, which led to reach good result. First, the requirements were collected from the dr. Xu, and final approval from him to finish from this stage. In fact, the requirement had been collected after conducting several meetings with the stakeholder. Having good requirements will lead to the success of the project, so this stage was so important to spend more time on. One month was enough to get this stage done, and a signing approval from the stack-holder was done to start developing this project.
The second stage, finding articles to get more familiar with the topic of the project was done after gathering the requirements. There were not enough materials about how are evaluating software components by using SNA. In this case, we were trying to redefine the SNA concept to software system concept to apply SNA approach. This stage was one of the hardest stages to do for the development team.
The third stage was about manipulating software system data, and this stage took 5 weeks. First week was for finding appropriate SNA software. We had found several SNA software which identify important nodes by calculate centrality measures. Eventually, we had chose Gephi software and learned data format, which it accept. To obtain these data, we need to generate software system source code to sequence diagram then to XMI file.
The fourth stage was parse XMI file to have two csv files which one fore nodes and another one for edges. After deep learning for which a good method to parse this file and tried many softwares, we had found wrote code was the good solution for this part. The parser code parse the file depending on lifeline, which is a node, and message which is an edge. However, this part has many problems, which faced with big source code. Sometime the code output misses some node’s label and has some nodes without edges.
The final stage in this project it is to find the best way for finding the match between SNA and software system analysis. In fact, this stage took the longest amount of time to interrupt the meaning of SNA to software system; in addition, how get benefit from SNA depend on software architecture quality.
In software systems, it is often challenging to know the importance of the objects. But on the other hand SNA provides the complete measurements of importance nodes in the whole network. It can be very useful in software systems, if we can incorporate those measures of SNA into it. This project is an attempt to solve the issue mentioned.
The main challenge to gather the requirements of this project was to know from where to start and which software to work with because the project is about connecting some aspects of two totally different areas together.
FIGURE : OVERVIEW OF PROJECT
The project needs to have a UML diagram from the source code of any software system or draw UML diagram by use UML software that we want to evaluate. The UML diagram which the project focuses on are sequence and/or communication diagram.
- To create the sequence diagram from the source code of the given system directly Altova is used.
- To make a communication or sequence diagram, Enterprise Architecture is used.
- From the Altova and Enterprise Architecture we can get the XMI file (That is the main reason to use them).
- The XML file is converted into CSV format.
- Gephi is an SNA tool which import the CSV file from diagram’ XMI file and created a SNA graph.
Altova / Enterprise Architecture
Altova / Enterprise Architecture are UML tools. Altova has capability to import source code and generate a sequence diagram directly from that. Also, we can use Enterprise Architecture to draw sequent/communication diagram too.
Altova is capable to generate sequence diagrams too, However there is a limitation – It cannot give export the XML file, instead it exports XMI, Which in turn in imported in Enterprise Architecture to get the required XML file.
To apply the SNA measures, we need two CSV files which one for nodes (classes) and another for edges (associations). The XML/XMI files already have this information under lifeline tag and message tag. So we need to parse that file into acceptable CSV format. The parser program has to be written in JAVA, and it should parse the XML/XMI and output the required two CSV files
Gephi is an SNA tool, which import the CSV files for edges and nodes and generates the SNA Graph (measures) for it. This is the last step of the project that gives the SNA measures for the software source code.
- The project shall be able to get the source code of the software
- The project shall be able to get the XML files
- The project shall be able to convert XML in to more manageable CSV files
- The project shall be able to get the SNA graph from the CSV files.
- Mac IOS system is the environment work but it doesn’t support Altova software.
- Using windows platform to avoid constraint that I mentioned.
- For the sequence diagram Altova is used. Altova runs only on windows environment, so the project shall work at least on windows.
- The Generated CSV files shall not contain duplicated records.
- The CSV files shall contain every needed record from the XML file.
- There should be interoperability between the three stages of the project.
- The system shall not complicate the current process of getting things done from the existing technologies being used.
- Preferences: The environment to be used is preferred to be windows, because all components of the project run on windows.
- Read a lot of researches to understand SNA concepts and tools.
- Watch many tutorials to determine which appropriate software for this project
Assumptions and Dependencies
- The user knows how to use Altova/Enterprise Architecture/Gephi
- The user has every component available
- The project is interdependent on Altova/EA – Parser – Gephi
Social Network Analysis
What is Social Network Analysis?
Social network analysis is used to measure and map different sort of relationships between organization, people, groups, computers and URL’s. The nodes in the network are represented by the people and groups and the relationship and between the nodes are represented by the links. Social network analysis provides a detailed analysis of relationship which could be both mathematical and visual. Consultants use social network analysis with their clients and name it organizational network analysis. 
Type of Network Analysis
There are two types of network analysis; ego network analysis and complete network analysis. Both of these types reflect two different types of data.
- Ego network analysis
The main method of doing Ego network analysis is with the help of traditional surveys. Every respondent is asked different sort of question such as; with whom they interact and what are the relationships among the people. While conducting the research the sample is driven from a very large population and there are very thin chances that a respondent will know any other and no such attempt is made to find such connections. Ego network analysis is very simple and easy because it could be carried out by the random sampling techniques. After that different sort of statistical techniques are used to test the hypothesis. The main purpose of analyzing the ego network is to access the quality of an individual’s networks.
- Complete network analysis
In complete network analysis the researcher tries to find out all of the relationships which exist in the set of respondents i.e. friendship between the employees of a specific company. Complete network analysis is very important as different sort of techniques such as; subgroup analysis, equivalence analysis and different measures like centrality all are dependent upon the complete network analysis.
Complete network analysis was used in this project because they need to determine the important players in whole network.
Software System UML
FIGURE 3: OVERVIEW OF UML DIAGRAM
This project focuses on interaction diagram which are communication and sequence diagram. “On communication diagrams, objects are shown with association connectors between them. Messages are added to the associations and show as short arrows pointing in the direction of the message flow. The sequence of messages is shown through a numbering scheme.” (Ltd, 2015). As well sequence diagram has the same communication diagram definition but with respect the time of messages. The origin of this project is from presumption that classes and associations in communication diagram, as players and relationships in Facebook. Therefore, this project adopts social network analysis techniques to visualize communication diagram and represent object as node and association as edge. These nodes and edges can be modeled as social relationship network by using social network analysis software.
<A description of the program architecture is presented. >
Social Network Analysis Centrality Measures
By looking at the example of an actor, if an actor has a lot of networks then the degree of centrality would be high but an actor with low degree of centrality will be more nonessential in network. The degree of centrality is directly proportional to the number of networks an actor would have. If we took an example of a university student and we want to find out who is the most famous student then we will have to look at which student has the highest amount of friends. So the degree of centrality is one of the most appropriate measures.
The betweenness centrality could be defined as all the short paths that are passing through a node and they also reflect the transitivity of a node. Those nodes will have higher amount of betweenness centrality that occur on the shortest paths between node pair as compare to those who do not.
If we look at the actor again then closeness centrality could be defined properly. Closeness centralities defined that how closes is the actor with other actors in the network because the actors which will be close to each other will be able to quickly interact with each other. This type of actors will be able to transfer information to other actors rapidly. The actor that will be close to other actors will be able to transfer the information and data more effectively and rapidly as compare to all other actors in network. So if geodesics increase in length then closeness centrality also decreases.
Eigenvector centrality is used to measure the importance of every actor in the network. It assigns score relative to all the actors in the network based, high scores are allotted to the actors who have more amount of connections as compare to the actors who have less amount of connections in the network. The importance of an individual could be measured by his friends; if one has more important friends then he will also be important.
Software System Centrality Measures
Convert Sequence/Communication Diagram to Graph
There are a lot of social network analysis softwares that have many facilities to analyze graph and measure the importance of nodes. After deep informed, it was chose Gephi software because it has measurement methods that this project need and a lot of tutorials that help to learn it. There are two ways to enter data in Gephi, manually or import csv file to obtain graph. Csv file is a spreadsheet that has columns and rows and to perfecting import data to software we need two csv files; one for nodes that is objects and another one for edges that is associations. I faced a lot of challenge to get csv file from communication diagram and tried many softwares for this purpose. After long research, it worked on two softwares that have many features to generate diagram to many types of files. However, these softwares don’t make this task properly.
Eventually it reached the following clear steps to explain how convert communication diagram to graph:
- Having communication diagram by two ways
Drawing manually by using Enterprise Architecture software
Generate java source code to sequence diagram by using Altova software and draw communication diagram from it.
- Generating communication diagram to Xml file
- Converting Xml file to two csv files; objects file and association file by using specific code
- Importing these two csv files in Gephi software one by one to get visualize network
Analyzing Interaction Diagram
The main aim of this project analyze interaction diagram as graph to identify important objects in a diagram and get some benefits from this analysis by evaluate a system properties. It was analyzed diagram as graph and applied the same approach that was used to analyze Facebook network. However, each measure such as degree, closeness, betweenness and eigenvector provide different analysis perspective for software system communication diagram. In the following paragraphs I will present these measures but in different way from social network
In interaction diagram for software system, the important class in the system that affects on more number of classes. For example, if a class has an error that leads to the more number of classes it affects. Thus we need measurement method to count the number of classes that will be immediately affects when a class fails. According to degree centrality meaning, we can measure the risk rate by know how many classes will be effected. The failure risk increase if dependency on other classes was high and we can know that by measure in-degree of class.
On the other hand a class with the highest out-degree is the most independent that mean lowest risk of failure when some functions change. Moreover the degree measurement reveals the risk that could happen to overall diagram and which function most frequently used. Nevertheless, the degree centrality method has drawback because it focus on individual class instead of whole network.
Object betweenness counts the number of shortest paths that pass one object. Hence, objects with high betweenness are important in communication and information diffusion. In some case the object is between two important constituencies and plays a broker role in the network. The good thing is that object plays a powerful role in the network, but in some case that objects is a single point of failure. Based on betweenness centrality, we can determine the risk that will happen to diagram when a class with high betweenness is remove since it handle less most of the communication.
When we talk about closeness centrality that means we refer to the class that can reach most classes in graph through shortest path.
Eigenvector is one of measures that determine important object in diagram and it can obtain by Gephi. If the object in diagram connect to many important objects, it should be important as well.
Table 2: The Interaction Diagram Analysis Measures
|1||Identify the most connected object||Highest degree centrality|
|2||Identify the most independent object||Highest out-degree|
|3||Identify the most dependent object||Highest in-degree|
|5||Identify complexity||High degree|
|7||Identify the broker||Highest betweenness|
|8||Identify risk rate||Degree|
In this case we drew small sequence diagram has three classes and two associations to making writ code easy. Then converted it to xmi file to determine classes ‘s tag and association ‘s tag to use them when we wrote the parser code.
FIGURE 4: DEMO SEQUENCE DIAGRAM
FIGURE 5: Demo’s XMI File Shows Lifelines Tag
FIGURE 6: Demo’s XMI File Shows Message Tag
FIGURE 7: Demo’s Csv File Shows Nodes
FIGURE 7: Demo’s Csv File Shows Edges
FIGURE 8: Demo’s Graph
Olin Shopping System
In this case we drew the diagram manually and converted to csv file directly by using Enterprise Architecture software.
FIGURE 9: Olin Shopping System Communication Diagram
FIGURE 10: Olin Shopping System’s Nodes Csv File
FIGURE 11: Olin Shopping System’s Edges Csv File
FIGURE 12: Olin Shopping System Nodes at Gephi Software
FIGURE 13: Olin Shopping System Edges at Gephi Software
FIGURE 14: Olin Shopping System Graph
Table 3: Centrality Metrics Measures for Olin Shopping System
|Measure||Degree Centrality||Betweenness Centrality||Closeness Centrality||Eigenvector Centrality|
“JasperReports Server is a stand-alone and embeddable reporting server. It provides reporting and analytics that can be embedded into a web or mobile application as well as operate as a central information hub for the enterprise by delivering mission critical information on a real-time or scheduled basis to the browser, mobile device, printer, or email inbox in a variety of file formats.” (TIBCO Software, 2015)
In this project we chose one application from jasperReport which cold BatchExport App.
FIGURE 15: Jasper Report (BatchExportApp) Graph
Table 3: Centrality Metrics Measures for Jasper Report (BatchExportApp) System
|Measure||Degree Centrality||Betweenness Centrality||Closeness Centrality||Eigenvector Centrality|
Graph Distance Report
Network Interpretation: directed
Average Path length: 1.443265306122449
Number of shortest paths: 1225
Ulrik Brandes, A Faster Algorithm for Betweenness Centrality, in Journal of Mathematical Sociology 25(2):163-177, (2001)
After Delete important nodes
Graph Distance Report
Network Interpretation: directed
Average Path length: 1.2676240208877285
Number of shortest paths: 766
“Vuze (formerly Azureus) is a P2P file sharing client using the bittorrent protocol. Search and download torrent files. Play, convert and transcode videos and music for playing on many devices such as PSP, TiVo, XBox, PS3, iTunes (iPhone, iPod, Apple TV)”
Table 4: Analysis Result of Centrality Measures For Many Systems
|Code Name||Code Size||Nodes||Life Line||Important Nodes||Accuracy|
|Online Shopping System||Small||12||12||
|Jasper Reports (BatchExportApp)||Big||464||523||
This project presented the process for evaluates software system‘s UML classes. Our objectives were; understand SNA methods and employed these methods on software system analysis.
In this project they have provided analysis approach for UML communication/sequence diagram from SNA which consist many phases to obtain final result. Despite the differences between the concepts of SNA and software system, but this project has succeeded after many attempts to conversion UML diagram to graph data. The main challenge which faced was convert the diagram to csv file that SNA software was accepts.
The benefits of this approach include:
- Generate source code to sequence diagram
- Generate sequence/communication diagram to XML/XMI file
- Parse XML/XMI file and converting to Csv file
- Grape software data in SNA (Gephi) to get graph
- Analyzing the software system graph to know which classes are important
The limitations of SNA and SWA analysis which I realized through I have been worked on this project which are:
- In SNA approach, the players who have high closeness or betweenness could be have not important players
- The degree centrality method focuses on individual node (object) instead of whole network.
- When generate sequence diagram code to XMI file, there are some lifelines
|Altova UML||Visually design application|
|Enterprise Architecture||Tool for manage information|
|Csv file||Comma-separated values file|
|Xml file||Extensible Markup Language file|
|Gephi||Tool for people that have to explore and understand graphs|
|Closeness||Social network analysis measure|
|Betweenness||Social network analysis measure|
|Degree||Social network analysis measure|
|Eigenvector||Social network analysis measure|
|Association||Relationship between two classes|
|Graph||Nodes and edges|
|Social Media||websites and other online means of communication that are used by large groups of people to share information|
|Diagram||Objects and associations|
|UML||Unified Modeling Language|
|Cohesion||single module/component is the degree to which its responsibilities form a meaningful unit|
|Coupling||Degree of mutual interdependence|
|Complexity||Measure the degree of connectivity between elements|
Example 1: Identifying Important Players in Facebook by using Gephi
- Import data from Facebook to Gephi
- Use netvizz Facebook- application to get graph data from Facebook-. Facebook has deferent data type such as: personal friend network, the network of likes, group data, page data and page like
- It chose the personal friend network data in this example.
- Netvizz created gdf file.
- Then import Facebook graph data in Gephi by select Open from File menu, then select .gdf file and choose the graph type –direct graph- from Graph Type.
- Layout Facebook network
- Apply a layout algorithm to graph from the Layout section.
- Then the network graph shows and this graph consisted on (235) nodes, representing my friends and (1.109) edges, which represent all the connections in between them.
- Analysis graph
Modularity is one of statistics in Gephi to get partitioning communities and make sub groups more distinguishable. In the modularity statistical report it is stated that my network has 25 communities, but 18 of them have only 1 member in it. So in practical terms, there are only 7 communities, ranging from 2 to 74 members. Those 7 groups are color identified in the figure
After that we can find out the important nodes in the network by calculate the following nodal metrics and find the player who has highest value.
- Degree Centrality:
In order to determine the nodes with high degree, we need to calculate out-degree and in-degree from Average Degree in statistics part in Gephi.
The graph ranked by degree is shown in the next figure and find the player Ghada Zamzami has highest degree value in this network:
- Betweenness Centrality:
To get betweenness, I ran network diameter from statistics. In the network diameter statistical report it is stated that my network has 9 diameter and 8016 shortest paths.
The graph ranked by betweenness centrality is shown in the figure and we notice Sheren Baksh has highest betweenness in the network.
It will be problem in graph if we remove that there are seven nodes that stand out as central nodes. This is explained because they act as gates between the two disconnected communities.
- Closeness Centrality:
As well as, we get closeness from diameter under statistics feature in Gephi software. The graph ranked by closeness centrality is shown in the figure and we notice Roa’a Alfadel has highest closeness in the network.
- Eigenvector Centrality:
To identify the player who has important friend in the network, we will run the eigenvector from statistics. The graph ranked by closeness centrality is shown in the figure and we notice Ahlam Alrashidi has highest eigenvector in the network.
- Importance Nodes:
|Measure||Degree Centrality||Betweenness Centrality||Closeness Centrality||Eigenvector Centrality|
|Player Name||Ghada Zamzami||Sheren Baksh||Roa’a Alfadel||Ahlam Alrashidi|
Appendix F: Screen Captures