[GISA] IRB application

When & Where ?

Inside The Application

– Protocol

  • Protocol Type:
    • Exempt: survey, interview which do not collect sensitive data (name, picture, identifiable information)
    • Expedited: when using more sensitive data
  • Principal investigator & lead unit
    • Advisor, faculty and his affiliation
  • Participant type
    • Total: general population in the US
    • Transnational: when open to all the people worldwide

– Questionnaire

  • Benefit #23301: not monetary, but more qualitative potential benefits for participants

– Attachment

  • survey, pamphlet, poster, interview guides, sample questions, and so on.

Other Helpful Resources

[GISA] IRB application

[2015 Spring, Complex System Seminar] Game theory

Definition of “Game Theory”

  • “… the study of mathematical models of conflict and cooperation between intelligent rational decison-makers.”([1])
  • originated as sub-fields of microeconomics and applied mathematics

Definition of “Game”

  • “In the language of game theory, a game refers to any social situation involving two or more individuals. The individuals involved in a game may be called the players.”([1])
  • Assumption on players
  • rational: A player is called as being rational, if he/she makes decisions consistently in pursuit of his own objectives(, which is maximization of his utility frequently).
  • intelligent: A player is called as being intelligent, if he/she knows everything that we know about the game and he can make any inferences about the situation that we can make.

Applications of Game Theory

  • Industrial organization (and their behaviors): analyzing cooperations(e.g. cartel) and competitions between firms
  • Auction theory: in terms of auctioneer and auction participants. e.g. Google auction, Yahoo auction, Soderby`s, ebay and so on.
  • Contract theory: Employer vs. Employee / Consumer vs. Producer
  • Evolutionary biology
  • Political science: international relationship, political parties
  • Public policy: Tragedy of commons, welfare policy design

List of Games

Why Do People Cooperate?

1. Kinship selection

  • When the sacrificing behavior of an agent can contribute to the spreading of its genes more than the cost for itself, it would choose to do. ([2], [3])

2. Indirect reciprocity

  • If each player decides whether to help someone or not based on the recipient’s image accumulated through previous altruistic behaviors, altruistic behavior becomes dominant. ([4])

3. Direct reciprocity

  • Repeated PD game
  • Tit-For-Tat: Select the previous strategy of your partner ([5])
  • win-stay, lose-shift: If your previous strategy was dominant toward the one of your partner, keep it. Otherwise, change it. ([6])

4. Costly signaling([7])

  • Group members have a personal characteristic, which we will call quality, that can either be high or low.
  • Each individual has occasion to enter into a profitable alliance (e.g. mating or political coalition) with any one of the other group members.

5. Altruistic punishment ([8])

  • If individuals can punish free riders in their group, although the punishment is costly and yields no material gain to the punisher, the cooperation flourishes.

6. Evolution of Social Network ([9])

– If cooperator pay the required cost, all his neighbors in a network would get benefit.
– In every turn, one randomly chosen player become dead.
– The tendency of new player for that position is decided depending on the sum of accumulated benefits of all neighbors.

7. Static Network ([10])

– If a social network is static, cooperative strategy becomes more stable.
– “We find that people cooperate at high stable levels, as long as the benefits created by cooperation are larger than the number of neighbors in the network.”


[1] Myerson, Roger B. Game theory. Harvard university press, 2013.
[2] http://en.wikipedia.org/wiki/Kin_selection
[3] Hamilton, William D. “The genetical evolution of social behaviour. II.” Journal of theoretical biology 7.1 (1964): 17-52.
[4] Nowak, Martin A., and Karl Sigmund. “Evolution of indirect reciprocity by image scoring.” Nature 393.6685 (1998): 573-577.
[5] Axelrod, Robert, and William D. Hamilton. “The evolution of cooperation.” Science 211.4489 (1981): 1390-1396.
[6] Nowak, Martin, and Karl Sigmund. “A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner’s Dilemma game.” Nature 364.6432 (1993): 56-58.
[7] Gintis, Herbert, Eric Alden Smith, and Samuel Bowles. “Costly signaling and cooperation.” Journal of theoretical biology 213.1 (2001): 103-119.
[8] Fehr, Ernst, and Simon Gächter. “Altruistic punishment in humans.” Nature 415.6868 (2002): 137-140.
[9] Ohtsuki, Hisashi, et al. “A simple rule for the evolution of cooperation on graphs and social networks.” Nature 441.7092 (2006): 502-505.
[10] Rand, David G., et al. “Static network structure can stabilize human cooperation.” Proceedings of the National Academy of Sciences 111.48 (2014): 17093-17098.

[2015 Spring, Complex System Seminar] Game theory

[2015 Spring AI: Mining the Social Web] TF-IDF (2) & Sentiment Analysis


– Open the Data File, then make Words Corpus and Tweet Dictionary

In [2]:
# Mostly simliar to Example 4-9. Querying Google+ data with TF-IDF 
# in our textbook "Mining the Social Web" 4.4.2 Applying TF-IDF to Human Languages

data = "oscar_tweets.txt"
tweet_dictionary = {}
words_corpus = []
i = 0
for line in open(data):
    if len(line.strip().split())!=0:
        tweet_dictionary[i] = line.lower()
        i += 1
print tweet_dictionary[1]
print words_corpus[1]
rt @dory: when you're washing the dishes at 7:15 but you remember you gotta be at the oscars by 7:30 http://t.co/27faqodhpm

['rt', '@dory:', 'when', "you're", 'washing', 'the', 'dishes', 'at', '7:15', 'but', 'you', 'remember', 'you', 'gotta', 'be', 'at', 'the', 'oscars', 'by', '7:30', 'http://t.co/27faqodhpm']

– Set Your Query Terms and Scoring Each Document (Tweet)

In [3]:
# Set your query with tf-idf method
QUERY_TERMS = ['lego']

# TextCollection provides tf, idf, and tf_idf abstractions so
# that we don't have to maintain/compute them ourselves
import nltk
tc = nltk.TextCollection(words_corpus)

relevant_tweets = []

for idx in range(len(words_corpus)):
    score = 0
    for term in [t.lower() for t in QUERY_TERMS]:
        score += tc.tf_idf(term, words_corpus[idx])
    if score > 0:
        relevant_tweets.append({'score':score, 'tweet':tweet_dictionary[idx]})

– Sort by Score and Display Results

In [5]:
relevant_tweets = sorted(relevant_tweets, key=lambda p: p['score'], reverse=True)
for tweet in relevant_tweets[:5]:
    print tweet['tweet']
    print '\tScore: %s' % (tweet['score'],)
how the lego oscars were built http://t.co/glbdphfyn9

    Score: 0.867215250635

http://t.co/lghymlygns - is getting a lego oscar bet

    Score: 0.758813344306

see how the awesome lego oscars were made https://t.co/lheategesj

    Score: 0.674500750494

how the lego oscars were built - gif on imgur

    Score: 0.607050675445

rt @thingswork: this is how the lego oscars were built http://t.co/kzuabkuy1u

    Score: 0.551864250404

2. Sentiment Analysis

– Scoring Positivity (or Negativity) of Tweets

In [7]:
# source: http://textblob.readthedocs.org/en/dev/quickstart.html#sentiment-analysis
# The sentiment property returns a namedtuple of the form Sentiment(polarity, subjectivity). 
# The polarity score is a float within the range [-1.0, 1.0]. 
# The subjectivity is a float within the range [0.0, 1.0] 
# where 0.0 is very objective and 1.0 is very subjective.

from textblob import TextBlob

positive_tweets = []
for idx in range(len(words_corpus)):
    positivity = TextBlob(tweet_dictionary[idx]).sentiment.polarity
    subjectivity = TextBlob(tweet_dictionary[idx]).sentiment.subjectivity
    if positivity <= -0.9:
        positive_tweets.append({'positivity':positivity, 'tweet':tweet_dictionary[idx]})

positive_tweets = sorted(positive_tweets, key=lambda p: p['positivity'], reverse=True)
for tweet in positive_tweets[:5]:
    print tweet['tweet']
    print '\tScore: %s' % (tweet['positivity'],)
zendaya defends oscars dreadlocks after 'outrageously offensive' remark via @abc7ny http://t.co/jirc40gy8p

    Score: -1.0

@mrbradgoreski travolta was the worst dressed wax figure at the oscars.

    Score: -1.0

the amount of pics of scarlett johansson &amp; john travolta at the oscars people texted me is obscene. i hate u all! (and u know me so well.)

    Score: -1.0

rt @mygeektime: just getting over an awful stomach virus...

    Score: -1.0

behati's style at the oscars was the worst ive ever seen omg

    Score: -1.0

– Scoring Subjectivity (or Objectivity) of Tweets

In [8]:
subjective_tweets = []
for idx in range(len(words_corpus)):
    subjectivity = TextBlob(tweet_dictionary[idx]).sentiment.subjectivity
    if subjectivity >= 1:
        subjective_tweets.append({'subjectivity':subjectivity, 'tweet':tweet_dictionary[idx]})

subjective_tweets = sorted(subjective_tweets, key=lambda p: p['subjectivity'], reverse=True)
for tweet in subjective_tweets[:5]:
    print tweet['tweet']
    print '\tScore: %s' % (tweet['subjectivity'],)
rt @9gag: remember the greatest oscars ever? http://t.co/qw3xdbmne9

    Score: 1.0

rt @logotv: confirmed. @actuallynph's bulge at the #oscars was indeed padded. watch: http://t.co/a8iaxitxcu

    Score: 1.0

rt @girlposts: remember the greatest oscars ever? http://t.co/ij9fm4cdhm

    Score: 1.0

rt @ryanabe: another oscars, another sad leo.

    Score: 1.0

rt @9gag: remember the greatest oscars ever? http://t.co/qw3xdbmne9

    Score: 1.0

[2015 Spring AI: Mining the Social Web] TF-IDF (2) & Sentiment Analysis

[2015 Spring AI: Mining the Social Web] TF-IDF (1)

  • Class material for February 26, 2015

After TF-IDF scoring of 100 documents…

  1. All words in 100 docs are in the corpus.
  2. Separately, each word in each doc has its own TF-IDF score. That is, each doc is represented as the vector of TF-IDF scores of all words in the corpus.
  • e.g.) It was awesome! -> [0, .2345, 0, 0, …, 1.23, 3.4] (if the corpus is ordered as [“you”, “it”, “sucks” , “cold”, …, “was”, “awesome”])


What is this TF-IDF for?

We’ve learned much about TF-IDF method; how to calculate TF score and IDF score, how the conceptual assumption in this method (Bag of Words) and so on. Then, what is this for? How can we use this for what?

  1. Having a seat with your group members.
  2. Discuss how to use this score generally or for your project. (10 min)

Is TF-IDF better than just counting hits?

One of the easiest way to find relevant documents about a specific query is finding the documents which contain the query words many times. In what situation, does TF-IDF work better than this?

[2015 Spring AI: Mining the Social Web] TF-IDF (1)