Cool data mining stuff:

 

From: jdm-society-bounces@mail.sjdm.org [mailto:jdm-society-bounces@mail.sjdm.org] On Behalf Of Kesten Green

Sent: Wednesday, June 22, 2005 8:13 PM

To: JDM Society List

Subject: [Jdm-society] Roles and decisions in conflict situations

It is often useful to predict the decisions that people will make when they are involved in conflict such as wars, takeover battles, and union-management disputes. In research on how best to make such predictions, Scott Armstrong and I obtained forecasts for eight real and diverse conflicts. We provided the participants in our research with between three and six plausible decisions for each conflict. By choosing at random from among the decisions, one would expect to be 28% accurate.

Here are our findings on the percentage of correct forecasts from four different methods:

Role-play simulations using novices             62%

Experts' structured analysis of analogies     56%

Experts' unaided judgments                             32%

Game theorists' judgments                                 31%

(Chance 28%)

People involved in conflicts are often advised to "Stand in the other person's shoes", in other words, to think hard about the roles of the other parties. Are list members aware of any evidence that this approach improves the accuracy of predictions?

Suppose I asked experts to indicate, for each party in a conflict, which decision the party would prefer, why it would be preferred, how the party would try to achieve the preferred decision, and to assess the chances that they might achieve it. Having done that exercise, how accurate would you expect the experts' forecasts of the actual outcomes of the eight conflicts used in the above research to be on average? [____%]

If I asked novices (undergraduate students) to do this, how accurate would you expect their forecasts to be? [____%]

As well as answers to the previous questions, I'd appreciate any suggestions on how best to get people to "stand in the other person's shoes" so as to obtain the most accurate forecasts of their decisions.

Kesten Green

----------------------------------------------------

Dr Kesten C Green, Business and Economic Forecasting Unit, Monash University

www.conflictforecasting.com

Contact at home: P: +64-4-976-3243; M: +64-21-456-516

Jdm-society mailing list

Jdm-society@mail.sjdm.org http://www.sjdm.org/mailman/listinfo/jdm-society

 

"Do not believe in anything simply because you have heard it. Do not believe simply because it has been handed down for many generations. Do not believe in anything simply because it is spoken and rumored by many. Do not believe in anything simply because it is written in Holy Scriptures. Do not believe in anything merely on the authority of Teachers, elders or wise men. Believe only after careful observation and analysis, when you find that it agrees with reason and is conducive to the good and benefit of one and all. Then accept it and live up to it." The Buddha on Belief, from the Kalama Sutta

confidence is percentage of times the rule is right out of the number of times the rule applies (in financial applications confidence might be the excess profit, which may or may not be proportional to the number of times that the rule is correct); support is the number of times the rule applies; note that statistical hypothesis testing involves confidence and support -- as the difference between the sample statistic and the universe statistic gets larger the p-value gets smaller (confidence) and as the size of the test sample gets larger the p-value gets smaller (support), so small p-value indicates a good combination of confidence and support. 100% confidence with support of 1 case is not too good. 60% confidence with support of a million cases might be very good. if we are choosing one alternative from 20 which are equally likely when selected at random, then 10% confidence with a support of a million cases might be very good.

AZMY data mining tools http://www.azmy.com/

http://www.datamininglab.com/toolcomp.html

http://www.xore.com/prodtable.html

http://www.cs.uvm.edu/~xwu/PPT/SEKE-02.ppt

glossary  www.twocrows.com/glossary.htm

Do the observations come from a particular distribution? http://www.itl.nist.gov/div898/handbook/prc/section2/prc21.htm

NIST statistics handbook  http://www.itl.nist.gov/div898/handbook/

stat course  http://www.mnstate.edu/wasson/ed602.htm

stat package    http://www.statistixl.com/

mining blog data http://www.pewinternet.org/ppt/BUZZ_BLOGS__BEYOND_Final05-16-05.pdf

regression tutorial 

http://mtsu32.mtsu.edu:11308/regression/level3/multicorrel/useexcel.htm

http://www.statsoft.com/

http://www.hs.ttu.edu/hdfs3390/hothand.htm

http://portal.acm.org/citation.cfm?id=956849

 

Transformation of data

Transformations are widely used in statistics to reduce data to standard forms. Some common methods of re-expressing data are as follows:

Centering -- The sample mean (column mean) is subtracted from the data values in order to obtain centered ``anomalies'' having zero mean. All information about mean location is lost.

Standardizing -- The data values are centered and then divided by their standard deviations to obtain ``normalized anomalies'' (meteorological notation) having zero mean and unit variance. All knowledge of location and scale is lost and so statistics based on standardized anomalies are unaffected by any shifts or rescaling of the original data. Standardizing makes the data dimensionless and so is useful for defining standard indices. Also note that correlation coefficients are unaffected by any linear transformations such as standardization.

Normalizing -- Normalizing transformations are non-linear transformations often used by statisticians to make data more normal (Gaussian). This can reduce bias caused by outliers, and can also transform data to satisfy normality assumptions that are assumed by many statistical techniques.

download economic data  http://research.stlouisfed.org/fred2/   http://research.stlouisfed.org/fred2/categories/22/downloaddata http://www.hsh.com/mtghst.html  http://biz.swcp.com/stocks/ http://www.ozgrid.com/Services/excel-download-quotes.htm http://finance.yahoo.com/  http://currencies.thefinancials.com/ http://www.bankofengland.co.uk/mfsd/ http://www2.boj.or.jp/en/dlong/stat/stat2.htm http://www.ecb.int/stats/mb/eastats.htm http://www.ecb.int/stats/mb/eastats.htm http://www.princeton.edu/~econlib/ds2/interest.htm http://www.fsu.edu/~finance/findata/intrt.html http://fisher.osu.edu/fin/fdf/osudata.htm http://www.gold-eagle.com/editorials_03/bolser121703.html http://www.lib.uchicago.edu/e/busecon/db/stats/  http://www.realrates.com/ http://www.bankofengland.co.uk/targettwopointzero/data/ http://fisher.osu.edu/fin/osudown.htm  http://www.analyzerxl.com/?3
 

FLORIDA JUDGES THROW OUT DUI CASES FOR LACK OF SOURCE CODE FOR BREATH TESTERS http://tampatrib.com/floridametronews/MGBUBJ5QK9E.html

[The interesting thing is that if the same principle was applied to computerized voting, we'd have to rerun two presidential elections. Anyone for a class action suit?]

TAMPA TRIBUNE - Hundreds of cases involving breath-alcohol tests have been thrown out by Seminole County judges in the past five months because the test's manufacturer will not disclose how the machines work. All four of Seminole County's criminal judges have been using a standard that if a DUI defendant asks for a key piece of information about how the machine works - its software source code, for instance - and the state cannot provide it, the breath test is rejected, the Orlando Sentinel reported Wednesday.

Prosecutors have said they do not know how many drunken drivers have been acquitted as a result. But Gino Feliciani, the misdemeanor division chief in the Seminole County State Attorney's Office, said the conviction rate has dropped to 50 percent or less.

 

 

New findings about "junk DNA" may bring some surprises (QUITE AMAZING!)
http://www.gewo.applet.cz/health/DNA_1e.htm
A group of researchers working at the Human Genome Project will be announcing soon that they made an astonishing scientific discovery: They believe so-called non-coding sequences (97%) in human DNA is no less than genetic code of an unknown extraterrestrial life form. The non-coding sequences are common to all living organisms on Earth, from molds to fish to humans. In human DNA, they constitute larger part of the total genome, says Prof. Sam Chang, the group leader. Non-coding sequences, also known as "junk DNA", were discovered years ago, and their function remains mystery. Unlike normal genes, which carry the information that intracellular machinery uses to synthesize proteins, enzymes and other chemicals produced by our bodies, non-coding sequences are never used for any purpose. They are never expressed, meaning that the information they carry is never read, no substance is synthesized and they have no function at all. We exist on only 3% of our DNA. The junk genes merely enjoy the ride with hard working active genes, passed from generation to generation. What are they? How come these idle genes are in our genome? Those were the question many scientists posed and failed to answer - until the breakthrough discovery by Prof. Sam Chang and his group.Trying to understand the origins and meaning of junk DNA Prof. Chang realized that he first needs a definition of "junk". Is junk DNA really junk, (useless and meaningless) or it contains some information not claimed by the rest of DNA for whatever reason? He once mentioned the question to an acquaintance, Dr. Lipshutz, a young theoretical physicist turned Wall Street derivative securities specialist. "Easy," replied Lipshutz. "We'll run your sequence through the software I use to analyze market data, and it will show if your sequences are total garbage, "white noise", or there is a message in there." This new breed of analysts with strong background in math, physics and statistics are getting more and more popular with Wall Street firms. They sift through gigabytes of market statistics, trying to uncover useful correlation between the various market indexes, and individual stocks. (...) "However, from the programmer's point of view, there is also positive outlook in it. What we see in our DNA is a program consisting of two versions, a big code and basic code. First fact is, the complete program was positively not written on Earth; that is now a verified fact. The second fact is, that genes by themselves are not enough to explain evolution; there must be something more in the game. What it is or where it is, we don't kow. The third fact is, no creator of a new work, be it a composer, engineer or programmer, from Mars or Microsoft, will ever leave his work without the option for improvement or upgrade. Ingenious here is, that the upgrade is already enclosed - the "junk DNA" is nothing more than hidden and dormant upgrade of our basic code! We know for some time that certain cosmic rays have power to modify DNA. With this in mind, plausible solution is available. The extraterrestrial programmers may use just one flash of the right energy from somewhere in the Universe to instruct the basic code to remove all the /*Š*/ symbols, fuse itself with the big code ("junk DNA") and jumpstart working of our whole DNA. That would change us forever, some of us within months, some of us within generations. The change would be not too much physical, (except no more cancers, diseases and short life), but it will catapult us intellectually. Suddenly, we will be in time comparable to coexistence of Neanderthals with Cromagnons. The old will be replaced giving birth to a new cycle. The complete program is elegant, very clever self-organizing, auto-executing, auto-developing and auto-correcting software for a highly advanced biological computer with build-in connection to the ageless energy and wisdom of the Universe. Software wise, within us is either short and diseased life, or potential for a super-intelligent super-being with a long and healthy life. This triggers puzzling questions - was the reduction to the basic code done by sloppy programmers in a rush (as it appears to us), or was the disabling of the big code purposeful act which can be cancelled by a "remote control" whenever desired?"Soon or later, we have to come to grips with the unbelievable notion that every life on Earth carries genetic code for his extraterrestrial cousin and that evolution is not what we think it is. This discovery may well shake the very roots of humanity - our beliefs in our concept of God and in our own power over our destiny. With the right paradigm, we may discover one day that all forms of life and the whole Universe is just one huge intellectual exercise in thoughts expressed mathematically, by Design, by Creator. - Recommended by Kalama Hawkrider <kalama52@yahoo.com> Omega-News Collection 28. May 2005   http://omega.twoday.net/stories/724602/

 

 

 

-----Original Message-----
From: jdm-society-bounces@mail.sjdm.org [mailto:jdm-society-bounces@mail.sjdm.org] On Behalf Of Reifman, Alan
Sent: Wednesday, May 25, 2005 7:08 PM
To: spsp-discuss@stolaf.edu; jdm-society@mail.sjdm.org
Subject: [Jdm-society] new book "freakonomics"

 

Many of you have probably already heard of the new book "Freakonomics" (or perhaps have even already read it).  It is written by University of Chicago economist Steven Levitt and journalist Stephen Dubner.  I, like many Americans, first heard of Levitt in an August 3, 2003 New York Times magazine profile of him, written by Dubner.  Here's a message I sent to the SPSP list in 2003 shortly after seeing the Levitt profile:

http://www.stolaf.edu/cgi-bin/mailarchivesearch.pl?directory=/home/www/people/huff/SPSP&listname=archive03&location=8731614

What has made Levitt stand out is his unconventional scholarly portfolio.  He, by his own admission, has limited grasp of many traditional economic ideas, is not associated with any theories in economics, and does research on what seem to be unusual topics.  In fact, Levitt recently characterized himself as being "adisciplinary," in his blog (see below).

The talent he has, though (in addition to an amazing work ethic -- he can sure crank out the papers), is in figuring out clever ways to design analyses of archival data to address interesting questions.

http://www.src.uchicago.edu/users/levit/recentpublications.htm 

Do Sumo wrestlers throw matches?  Do real estate agents go the extra mile to help their clients get the best possible deal in selling their homes?  Might the legalization of abortion (starting in certain states in the late 1960s and culminating in the 1973 Roe v. Wade decision nationally) be a major factor in the early 1990s crime drop?  Were contestants of certain demographic groups discriminated against on "The Weakest Link"?

In all of these cases (and others), Levitt and his collaborators came up with ingenious comparisons to test within the relevant datasets that would go a long way toward answering the questions.  Some of the comparisons I could anticipate while reading the scenarios, but most I could not.

Beyond the empirical analyses, however, the storytelling is also spellbinding in places.  Two examples, in particular, are the story of how one individual's strategic use of the Superman radio show helped eviscerate the Ku Klux Klan, and of how one of Levitt's colleagues ended up embedding himself in a Chicago crack-dealing gang as part of his research studies.

At about 200 pages, Freakonomics is a pretty quick read, one that I largely found exciting (there were a couple of studies that I thought were less compelling than others, however).

The aforementioned study linking abortion to later reductions in crime has been especially controversial.  I've seen Levitt and his collaborator on that study, John Donohue, discuss this matter in various venues.  The point they make, as I see it, is that the central concept of the study is not abortion, per se, but rather reducing the incidence of unwanted pregnancies and poorly cared for children, which can be accomplished by many non-abortion means, such as sex education, abstinence, contraception, parent education, adoption, etc.

The authors also maintain a blog related to the book (including reader comments), so there's opportunity for extended discussion on many of the issues raised in the book:  http://www.freakonomics.com/blog.php .  Levitt has even introduced new topics on the blog that were not in the book, including one that is near and dear to my heart -- the "Moneyball" approach to using statistics in sports decision-making (Levitt does not dispute that Oakland A's GM Billy Beane was able to compile play-off-caliber teams for many years with a much lower payroll than other teams; Levitt's contention is that the A's did NOT employ game-management strategies that were appreciably different from those of other teams).

One last thing:  I want to state for the record that on May 11, 2001, more than two years before ever hearing of Levitt, I sent a message to the SPSP list suggesting the use of "The Weakest Link" to study discrimination:

http://www.stolaf.edu/cgi-bin/mailarchivesearch.pl?directory=/home/www/people/huff/SPSP&listname=archive01&location=2224330

As I've noted, however, the difference between Levitt and me is that he actually goes ahead and DOES these studies!

*********************************************************
Alan Reifman, Ph. D.,  Associate Professor
Dept of Human Dev't and Family Studies
College of Human Sciences
Texas Tech University
Lubbock, TX 79409-1162
(806) 742-3000
http://www.hs.ttu.edu/hdfs/Faculty/reifman.htm

 

career possibility: master GOOGLE programming and the contents of this book -- http://www.amazon.com/gp/reader/0470849061/ref=sib_dp_bod_toc/104-4588979-9945519?%5Fencoding=UTF8&p=S00B#reader-page

Modeling the Internet and the Web: Probabilistic Methods and Algorithms by Pierre Baldi, Paolo Frasconi, Padhraic Smyth, Pierre Baldi

Hot Hand in Sports  http://www.hs.ttu.edu/hdfs3390/hothand.htm
Rising doctors' premiums not due to lawsuit awards: Study suggests insurers raise rates to make up for investment declines http://snipurl.com/fax1
dynamic yield curve site  http://stockcharts.com/charts/yieldcurve.html