Each output file contains a header. If you open one output file with a text editor (we recommend NotePad++ on Windows, emacs on Linux and TextEdit on the Mac), then you will find that the first few lines start with a "#" sign. These are the comment lines of the file and describe the contents of the given file.
Yes. Please, contact us at CFinder@hal.elte.hu .
Note that there can be several reasons for an unusually slow community finding. One of the most frequently occurring reasons is that the "core" (the most densely connected region) of the network contains several strongly overlapping cliques of size 50-100 (or above). You can read further details about the amount of CPU time and memory used by CFinder in the sections on CPU and memory requirements and on how to increase the amount of memory CFinder uses. There are some other hints in the FAQ here and here.
The original Clique Percolation Method (CPM) used by CFinder is designed to locate the k-clique communities of unweighted, undirected networks. (Extended versions of this algorithm, included in
An illustration of these communities can be given by k-clique template rolling. A k-clique template can be thought of as an object that is isomorphic to a complete graph of k nodes. Such a template can be placed onto any k-clique of the network, and rolled to an adjacent k-clique by relocating one of its nodes and keeping its other k-1 nodes fixed. Thus, the k-clique-communities of a graph are all those subgraphs that can be fully explored by rolling a k-clique template in them but cannot be left by this template.
The CPM was inspired by the fact that the k-clique communities also correspond to percolation clusters in the k-clique adjacency graph of the system. The nodes of the k-clique adjacency graph represent the k-cliques of the original network, and there is an edge between two nodes if the corresponding two k-cliques are adjacent.
The advantages of the above community definition are the following:
For details on the algorithm used by CFinder, see supplemental information for the 2005 Nature paper
The original CPM method, as described above, can only handle undirected, unweighted networks.
For details on the criteria and the algorithm, see the 2007 papers in New J. Phys
Note that due to the extra criteria used for deciding which cliques to use, the communities can't be generated from the maximal cliques of the graph, so for CPMw CFinder simply directly creates the communities by growing a single k-clique (the CPMd uses maximal directed cliques).
CFinder can be downloaded from http://CFinder.org. For unpacking the downloaded zip file, we suggest 7zip on Windows, the standard Linux unzip utility, and Aladdin Stuffit Expander on the Mac. (Note: winzip had problems with unpacking some of the earlier releases.)
CFinder uses the Java Runtime Environment version 1.5 or higher by Sun. (CFinder might work with non-Sun java runtimes, but it is not tested.) Type
A note for Solaris users: you will also have to download the -solaris.zip file and replace the libcommfind.so and CFinder_commandline files with the ones in the solaris .zip file.
To launch the GUI version of CFinder, use the start.bat script on Windows and start.sh on Linux, Mac and Solaris.
These are the directories and files in the downloadable package of CFinder.
If automatic updates are not enabled on your Mac, please update your system before using CFinder. Click on the Apple icon in the top left corner and select "Software Update". Make sure to update both Java and Mac OS X to their most recent versions.
On several Linux systems the default Java engine is not the Java Runtime Environment by Sun and/or there are various other Java engines available.
Java is meant to be platform independent. Unfortunately, the different Java engines show sometimes unexpected differences in the way they run the same Java program. We suggest that you should use the Java Runtime Environment version 1.5 by Sun or higher to run CFinder.
The amount of memory needed for the community finding in your network will be most strongly influenced by the number and the overlaps of cliques / dense regions. Of course, the number of nodes is also important, but the necessary CPU time and memory grow only slowly with the number of nodes.
A short explanation follows here. (Jump to the next paragraph, if you want to receive practical advice.) A linear chain of nodes with a few loops on it contains only a small number of small cliques, e.g., triangles, and CFinder will need little memory and CPU time for this network. If the input network is a complete graph (each node is connected to all other nodes) is one clique, and resources of your computer needed for this network will be small again. Between these two extremes are those networks that need more memory and CPU time. These contain a high number of large, overlapping cliques: there are large dense regions in the network, but a typical node is not connected to all other nodes. (Remember that a clique is a complete subgraph not contained by any other complete subgraph.)
In practice, the best way to run CFinder on a large network is to use confidence values for the links. The confidence (or “weight”) of each link can be used for filtering the links in the “New Community Finding” dialog. At the first run, set a high value for the lower cutoff in the “New Community Finding” dialog. This will remove many links from the network and the remaining graph that CFinder receives for analysis will be sparse. Later, you can run CFinder again (File > Open new community) and decrease step by step the lower threshold to leave more and more links in the network.
If you are using a large network and/or a network with many overlapping dense regions, and CFinder stops with an error message explaining that more memory would be necessary, then in the start batch file increase the memory requested by CFinder from your operating system. In the start batch files (start.bat on Windows and start.sh on Linux and Mac) you will find two switches: "-Xmx224m" and "-Xss32m". Increase both numbers (224 and 32) such that their sum remains smaller than the physical memory of your computer.
Open new network
Opens the “New Community Finding” dialog. In this dialog click on
The main CFinder window with the “New Community Finding” and the “Open file” dialogs.
Once you have selected the input file, you can set the parameters to use. Most of these parameters are optional, leave the associated checkbox empty to not use the given setting.
This feature can be used to run CFinder on a filtered version of your network. Use the fields to the left and right from the word weight to select a lower and upper cutoff for link weights. For the community finding only those links of the network will be used whose weights (as listed in the third column of the input file) are between the lower and the upper cutoff values. All other links of the network will be discarded before the community finding starts. If the input network file provides no weights for the links, then CFinder will ignore the cutoff values that you enter in the “New Community Finding” dialog.
For the weighted algorithm (CPMw), also specify an intensity threshold, (see the paper on CPMw for how this is used) and you can also optionally limit the algorithm to a given k value. (only results for this given k will be calculated.)
Output directory (for the computed cliques, communities, etc.)
To save the results, CFinder creates a new directory. If the directory already exists, CFinder will ask whether to overwrite it, since that could result in loss of data. This directory will be created at the same location as the analyzed input network file, and the name of the output directory will be the name of the input network file plus a “_files” suffix. If you select a lower and upper threshold for the links weights, then these two weights will be appended to the name of the output directory after “_files”, e.g., if the input file
Open communities (choose directory with previously computed data)
If you have already run CFinder for a network and created an output directory, you can view the results again by selecting
The main CFinder window with the dialog for opening already computed communities.
Select this option at the bottom of the “New Community Finding” dialog.
Approximate (fast) clique finding. Select "Approximation" and select the time limit.
Networks (both small and large) may contain many large overlapping cliques. CFinder allows setting an optional time limit (in seconds) for the time to spend on each node of the network. If exploring the neighborhood of the given node would take longer than this, CFinder will (temporarily) give up and proceed to the next node. Once all nodes have been tried, it will try those where it gave up previously (since processing the other nodes has simplified the neighborhood of those nodes, as well). If even this fails, (i.e. each remaining node takes longer than the time limit to process), it will assume that all these nodes are members of the same, huge clique (this is a reasonably safe assumption, since this is indeed the case for several networks).
After computing the communities of a network (or after loading the results of a previous community finding), CFinder can show you the results in various formats. The default view is “Communities”.
|Zoom in / out||Use the scrollbar on the bottom|
|Move entire graph||Left-click on the background and drag|
|Select one node||Right click on the node|
|Select nodes||With the right button select the rectangle containing these nodes|
|Move selected items||Select the items, then left-click and drag|
|Grab and move a node||Left-click on the node and move it|
Note that for large communities (above 1000 vertices) the computation of the graph’s layout may take longer. If you decide not to wait until the layout routine finishes, you can select a different vertex and
k (clique size) parameter value: the previous layout calculation will be cancelled and the newly selected communities will be displayed in the graph visualization panel.
If you would like to browse through the communities of selected nodes, but you do not want to visualize them as a graph, select
Settings, and uncheck the
Display graph item.
CFinder contains two widgets for visualizing graphs. The default, newer, one (based on prefuse) can show an animated layout and can draw community borders. The older one is the same as the on in CFinder 1.21. To select between these two, use the use prefuse widget checkbox in the Tools --> Settings dialog.
The main new features the new widget supports are the community borders and the continous, animated layout. The community borders are convex 'rubberbands' draw around the nodes of the communities. All nodes of the community will be inside this border. The overlap of these borders can show the overlap of communities visually. One note of warning: unfortunately it can not be quarranteed that other nodes, not belonging to the communities, will not be inside this area. The display of the community borders can be toggled with the 'show community borders' checkbox on the prefuse settings panel.
The layout algorithm used in the new widget can run interactively (this is the default setting), which means that the graph view is updated as the nodes are moved by the algorithm. The layout method is based on a physical model: edges are modeled as springs, with a given length and spring coefficient. There is also a global repulsion between the nodes and friction, to slow them down. The parameters that control the strength of these forces can be adjusted on the prefuse settings panel (Tools -> Prefuse settings...).
Opening a community will first place all nodes near the center of the screen, and then allow the graphs to relax according to these forces. The nodes and communities can be moved by grabbing them with the mouse. The neighbourhood of the moved object will react accordingly: for example moving a node will move its neighbors as well, following the 'edges are springs' model. Similarly adjusting the layout parameters on the prefuse settings panel will have immediate effect, as if changing the spring coefficients, etc.
The position of the nodes will be continously updated according to the physical model as long as the layout algorithm is running. This might be distracting occasionally. (For example, when most of the graph is layouted nicely, but there are small adjustments to be made, to avoid false overlaps between the communities.) In such cases the layout algorithm can be turned off either with the 'stop layout' button on the toolbar or using the 'run continous layout' checkbox on the prefuse setttings panel. With the algorithm turned off all nodes will be stationary, moving one will only move that node and none of its neighbors, etc.
The new widget uses mostly the same mouse-actions as the old one: nodes can be selected by clicking on them, moved by grabbing and dragging them with the mouse. Grabbing and dragging the background will move the whole graph, while grabbing and dragging a communitity will move only the nodes it contains.
In addition, the whole graph can be zoomed with either the mouse-button, or by clicking with the right mouse button on the background and dragging. Double-clicking with the right mouse button on the background will zoom the graph to fill the display. Severals nodes at once can be selected with a rectangle-selection using the middle mousebutton (click and drag; note that many operations, like 'communities of selected vertex' will require only one node to be selected.).
In this view you can have a look at the communities separately. However, if you would like to explore the connections of a community to other communities, then you can select a vertex in the graph and click on the walk button. This will bring you to the “Vertices” view where all communities of the selected node are displayed. Remember that at different k (community finding stringency) values the communities of a node can be different. If you start from a community with k=4, then pressing the walk button will bring you to the communities of the selected node at k=4.
k=4in the DIP yeast core protein-protein interaction network.
In the left panel communities are listed by their
k values (community finding stringency values). Click on one of the listed communities to view it as a graph in the lower right panel. To highlight a vertex or an edge in the graph, click on its name in the upper right panel. The cliques contained by the selected community are also listed in this panel.
The Zoom and Walk buttons help you to navigate between the various views. To learn what actions they will perform in your current view, just read their text labels.
You can start by looking at the communities of a selected vertex, then highlight one community (use the tabs in the upper right panel) and press the zoom button to view the graph of communities around this community. In the network of communities you can select a different community and click on the walk button to see the neighbors of that community: you are “walking” on the graph of communities and CFinder shows you the neighborhood of the selected community. After your walk on the community graph, you can select some of the communities and press zoom to view their nodes.
This option lists the cliques detected by CFinder. Each clique has an index. To view the list of nodes in a clique, open the folder of the clique.
A short reminder: a clique is a complete subgraph (each node is connected to all other nodes) not contained by any other complete subgraph.
Stats: Statistics of the communities.
With this option you can view the community statistics in your network for a selected k-clique size. The “log-log plot” button changes both scales to logarithmic (and back to linear), while the “cumulative distribution” button switches between the original histogram of values and the plot with 1 – P, where P is the cumulated probability density function.
Four types of statistics are available for each k-clique size:
|Community size||Number of nodes in a community|
|Community degree||Number of other communities overlapping with a selected community|
|Community-community overlap||Number of nodes contained by two overlapping communities|
|Node membership number||Number of communities containing the selected node|
For the visualization of the statistics, CFinder uses the Java package JFreeChart 1.0.0. To zoom in, left click inside the chart, hold down the left mouse button while moving the mouse. This way you can select the rectangular area that you would like to enlarge. To zoom out completely and to view the entire distribution, right-click inside the chart to open a menu of options, and then select Autorange à Both Axes from this menu.
To export a distribution, right-click on the distribution, select Export from the pop-up menu and enter a file name in the “Save” dialog. Currently, there is one available file format for exporting the distribution (PNG), please enter a file name ending with .png. Further information about JFreeChart can be found at the website of JFreeChart: http://www.jfree.org/jfreechart.
Graph of communities
In the graph window only a part of the community graph is displayed: the community selected in the left panel and its neighbors within a fixed distance. This fixed distance is 2 by default, and it can be changed in the Tools > Settings menu: set the “Community graph depth” variable to your preferred value.
To view the original nodes of a group of communities, select these communities and press zoom.
In the graph (network) of communities (i) nodes represent the original network’s communities and (ii) two nodes are connected, if the corresponding communities in the original graph overlap (i.e., they share at least one node).
Exports the graph currently displayed in the graph visualization panel (in the lower right panel) into a file. A large list of formats are supported, for example BMP, EPS (Encapsulated PostScript) and PNG (Portable Network Graphics).
Use this option to stop the calculation of the graph layout.
This dialog contains general settings, options for the communities and for the graph of communities.
Settingsdialog you can modify the display properties of the graph visualization panel of CFinder.
On some Linux platforms the Export Graph function does not work. If you encounter this problem, please contact us. We can help you with this.