**Student projects -- Python and networks -- Spring 2016**

The author of each project question (problem) is the student.

# 1. Neighbor numbers of drones (student: SZT)

## 1.1 Problem

Generate a dataset using a flying robots simulation with positions and timestamps in it. Define two units whose distance is less than the predetermined communication range as neighbours. Calculate the node degree of each unit in every time step as the number of their neighbours. Plot the average and standard deviation of the node degrees as a function of time.

## 1.2 Solution

Input data sets are not shown here.

### Python script

# Python programming and networks - Exam project
# Szorenyi Tamas
# Task description:
# Generate a dataset using a flying robots simulation with positions and timestamps in it.
# Define two units whose distance is less than the predetermined communication range as neighbours.
# Calculate the node degree of each unit in every time step as the number of their neighbours.
# Plot the average and standard deviation of the node degrees as a function of time.
# Importing necessary libraries
from sys import argv
import math
import re # May not be needed
# Program arguments
script, NumberOfUnits, CommRange, InputFile = argv
NumberOfUnits = int(NumberOfUnits)
CommRange = float(CommRange)
# Read data file into dictionary
Data_Dict = {}
Rows = 0 # Total row count
with open(InputFile, 'r') as f:
i = 0 # Primary key in dictionary (row number)
for line in f:
if not line.startswith('#'):
splitLine = line.split()
Data_Dict[i] = {}
for j in range (2*NumberOfUnits+1): # Secondary key in dictionary (0: time, 1-: unit ID)
Data_Dict[i][j] = float(splitLine[j])
i += 1
Rows = i
# print Data_Dict[3][5] # Test
# print Rows # Test
# Calculate distances and count neighbours
Neighbours_Dict = {}
for k in range(Rows): # Row number
Neighbours_Dict[k] = {}
for p in range(NumberOfUnits): # Units
Neighbours_Dict[k][p] = 0
for q in range (NumberOfUnits): # Neighbours of current unit
if not q == p:
NeighbourDistance = math.sqrt( (Data_Dict[k][2*q+1]-Data_Dict[k][2*p+1])**2 + (Data_Dict[k][2*q+2]-Data_Dict[k][2*p+2])**2 )
if NeighbourDistance <= CommRange*100.0: # m --> cm conversion
Neighbours_Dict[k][p] += 1
# print Neighbours_Dict[400] # Test
# Calculating average and standard deviation of the node degrees
Average = 0.0
StDev = 0.0
print "# Time(s)\t# Average\t# Standard_deviation"
for i in range(Rows):
for j in range(NumberOfUnits):
Average += Neighbours_Dict[i][j]
Average = Average / NumberOfUnits
for h in range(NumberOfUnits):
StDev += math.pow((Neighbours_Dict[i][h] - Average), 2)
StDev = math.sqrt(StDev / NumberOfUnits)
print "%.3f\t%.3f\t%.3f" % (Data_Dict[i][0], Average, StDev)

### Output: Average (example)

### Output: Standard deviation (example)

# 2. Imbalance of phylogenetic trees (student: KZS)

## 2.1 Problem

Investigate whether gene transfer and duplication affect phylogenetic tree imbalance.
Compute imbalance on real gene trees.
Analyze results of simulations.

# 3. Epidemic spreading on Erdos-Renyi networks (student: DG)

## 3.1 Problem

Run the SIS (Susceptible - Infected - Susceptible) model on an ER network and compare the final ratio of infected nodes to the analytical result. In one time step an infected node can infect each of its neighbors with probability \(\ell\) and an infected node can recover, i.e., become susceptible again with probability \(d\).

## 3.2 Solution

### Python code

Reads ER graph, writes how many are infected at the end.

#! /usr/bin/env python
# Simulate an epidemics.
import sys
import random
script, l, d, t, m = sys.argv
# l is the probability that a node is infecting the neighbor.
l = float(l)
# d is the probability a node is cured.
d = float(d)
# t is the number of steps for the thermalization, to reach equilibrium.
t = int(t)
# m is the number of iterations from the average is calculated after the equilibrium.
# If set to 0, the program will print out the number of infected nodes in time.
m = int(m)
# Associative array of the edges.
# neighbors[node1] = { ... }
neighbors = {}
# Read in all the edges, line by line
for line in sys.stdin:
linedata = line[:-1].split(' ')
n1 = linedata[0]
n2 = linedata[1]
if not neighbors.has_key(n1):
neighbors[n1] = set()
if not neighbors.has_key(n2):
neighbors[n2] = set()
neighbors[n1].add(n2)
neighbors[n2].add(n1)
# Simulate the epidemics.
# Set of infected nodes.
infected = set()
# First, infect some random nodes.
infected = set(random.sample(neighbors, 20))
# Averaging out for m iterations after t.
avg = 0
t_curr = 0
m_curr = 0
while (t_curr != t or m_curr != m):
if len(infected) == 0:
break
# In every timestep every infected node infects it's neighbors with l probability.
next_infected = set()
for i in infected:
for neighbor in neighbors[i]:
if random.random() < l:
next_infected.add(neighbor)
# In every timestep every infected node gets cured with d probability.
next_cured = set()
for i in infected:
if random.random() < d:
next_cured.add(i)
infected = infected.union(next_infected)
infected = infected.difference(next_cured)
if t_curr != t:
t_curr += 1
else:
avg += len(infected)
m_curr += 1
if m == 0:
print(len(infected))
# Print out how many are infected at the end.
if m != 0:
print(float(avg) / float(m))

### Image: convergence to final value

### Image: final values

# 4. Attack and error in ER and SF networks (student: BB)

## 4.1 Problem

Take an ER and an SF graph, and simulate error (random node removal) and attack (removal of nodes in the descending order of their original degrees). For each of the four cases plot the relative size of the largest graph component as a function of the removed fraction of nodes. The two graphs should have the same number of nodes. For the ER graph use \(\langle k\rangle=4\), and for the SF graph use \(m=2\).

# 5. Degree distributions of French word groups (student: PD)

## 5.1 Problem

Visit http://autourdumot.fr/fr.N.famille to download subgraphs of "modern" words (computer, mobile phone, etc.) and subgraphs of "old" words (good, bad, fish, etc.). The human eye is an excellent device for comparing graphs, therefore, first, compare the two types of subgraphs visually (look at them to find structural differences). Based on your visual observations, compare also the CCDFs of the node degrees of the two types of subgraphs.

# 6. CCDF of the public transportation graph of Budapest (student: JM)

## 6.1 Problem

Download public transportation schedules from bkkinfo.hu. Merge stops that are at the same location (for example, Váli utca and Móricz Zsigmond körtér are at the same location). Compute and plot the CCDF of the obtained public transportation graph.

## 6.2 Solution

For a subset of the entire timetable

### Python code

#!usr/bin/python
# -*- coding: utf-8 -*-
import os, sys, math, re, string, operator
reload(sys)
sys.setdefaultencoding('utf-8')
### Parse trip data into an array, return values are the tripIDs and stopIDs
def parse_data(filename):
header = []
data = []
# parse as rows
for line in open(filename, 'r'):
line = line.strip()
if not line or line.startswith('#'): continue
linesplit = line.split(',')
if linesplit[0] == 'trip_id':
header = linesplit
continue
data.append([p for p in linesplit])
# reorganize for columns
data2 = [[] for x in xrange(len(header))]
for line in data:
for i,x in enumerate(line):
data2[i].append(x)
tripID = data2[header.index("trip_id")]
stopID = data2[header.index("stop_id")]
# tripID and stopID have the same length, so this is like (so not technically) a list of tripID-stopID touples
# in the format if [(tripA, stopA1), (tripA, stopA2), (tripA, stopA3), (tripB, stopB1), (tripB, stopB2)...]
# therefore the task will be to somehow concatenate é üűőóáéű
return (tripID, stopID)
### Parse stopIDs and names
def parse_stops(filename):
header = []
data = []
# parse as rows
for line in open(filename, 'r'):
line = line.strip()
if not line or line.startswith('#'): continue
linesplit = line.split(',')
if linesplit[0] == 'stop_id':
header = linesplit
continue
data.append([p for p in linesplit])
# reorganize for columns
stopIDsToNames = dict()
for line in data:
line[1] = line[1].translate(string.maketrans('', ''), '!@#$') # removing "-s
stopIDsToNames[line[0]] = line[1].encode('utf-8')
return stopIDsToNames
### Create a list of routes from the raw data. Every route is a path of the network, so they will be used to build the network
def process_data(tripID, stopID):
# we cerate a list which only contains the trip IDs (uniquely)
tripIDUnique = list(set(tripID))
# trips is list of lists of the stopIDs belonging to the certain trips
trips = [None] * len(tripIDUnique)
for j in range (len(tripIDUnique)):
trips[j] = list()
for i in range(len(stopID)):
if tripIDUnique[j] == tripID[i]:
trips[j].append(stopID[i])
# trips contains many routes multiple times, therefore we have make a unique list from trips, which is routes
trips.sort()
routes = list()
for i in range(len(trips)):
if routes.count(trips[i]) == 0:
routes.append(trips[i])
return routes
### Create nodelist
def create_network(routes):
nodeList = list()
for i in range(len(routes)):
for j in range(len(routes[i])):
nodeList.append(routes[i][j])
return nodeList
### Map stopIDs to locations
def map_to_locations(nodeList, stopIDsToNames):
nodeListLocations = list()
for i in nodeList:
nodeListLocations.append(stopIDsToNames.get(i))
return nodeListLocations
### Create the network's degree distribution based on the edgelist
def get_degreedistr(nodeList):
degreeDistribution = dict()
for node in nodeList:
degreeDistribution[node] = nodeList.count(node)
degreeDistributionSorted = sorted(degreeDistribution.items(), key=operator.itemgetter(1))
return degreeDistributionSorted
### Get the CCDF
def print_ccdf(degreeDistribution):
for i in range(len(degreeDistribution)):
print 1 - sum(1 for j in degreeDistribution if j < i) / float(len(degreeDistribution)+1)
### Main
# read the data
inputfile1 = 'stop_times_part.txt'
inputfile2 = 'stops.txt'
tripID, stopID = parse_data(inputfile1)
stopIDsToNames = parse_stops(inputfile2)
# process data
routes = process_data(tripID, stopID)
# create network
nodeList = create_network(routes)
# convert into a location network
nodeListLocations = map_to_locations(nodeList, stopIDsToNames)
# create degree distribution
degreeDistribution = get_degreedistr(nodeListLocations)
# get the CCDF
degreeDist = list()
for (key, value) in degreeDistribution:
print key, value
degreeDist.append(value)
print_ccdf(degreeDist)

### Image (CCDF)

# 7. Infection on a scale-free graph (student: BGS)

## 7.1 Problem

Infect the largest (degree) node of a BA scale-free graph. After this in each update every infected node will infect each of its neighbors with probability p. Plot the relative number of infected nodes as a function of time, for several values of p.