1. Using SQL commands in R

    03-31-2009 by dan

    Suppose you are at the zoo, and you want to know how many animals are at the zoo.  If you have a database that contains this information, it’s easy enough:

    In SQL:

    select count(distinct animal) from zoo;

    In R:

    length(unique(zoo$animal))

    Easy enough! The tough part for R comes when you want to start doing things by a grouping.

    Now suppose you are at the zoo, and you want to know how many types of animals are in each cage.

    In SQL:

    select cage,count(distinct animal) from zoo group by cage;

    In R:

    tapply(zoo$animal,list(zoo$cage),function(x) length(unique(x)))

    But this is just the start of the trouble for R. What if you really wanted to know some characteristic of the animals in each of the cages? Suppose you want to know how the average age of each type of animal in each cage:

    In SQL:

    select cage,animal,avg(age) from zoo group by cage,animal;

    In R:

    tapply(zoo$age,list(zoo$cage,zoo$animal),mean)

    You can see how the R code seems to get more complicated depending on exactly what kind of function you’d like to apply. Even worse, you still might have to transform the resulting data you get out of the tapply function. (sometimes you want a list of tuples instead of the matrix form)

    A simple solution is to use the SQL code from within R. You will need to install the package SQLDF. With this package installed, you can easily replicate the SQL functions above like so:

    In R:

    sqldf("select cage,count(distinct animal) from zoo group by cage")
    sqldf("select cage,animal,avg(age) from zoo group by cage,animal")

    The sqldf library is a very simple way to think about data in terms of SQL instead of tapply. Don’t dismiss tapply entirely, though! As you can see in my above examples, you are able to apply ANY function to a grouping of data, so R in infinitely more flexible than SQL in that sense.

    • Share/Bookmark

  2. what's with the "below the fol…

    03-23-2009 by dan

    what’s with the “below the fold” line in websites these days? there is no fold, print media is dead, get over it.

    • Share/Bookmark

  3. wow – FDIC closed 3 more banks…

    03-20-2009 by dan

    wow – FDIC closed 3 more banks today — now up to 20 for the year. and 2 corporate credit unions were seized by NCUA

    • Share/Bookmark

  4. gaussian copula — bad because…

    by dan

    gaussian copula — bad because distribution functions were modeled as ECDFs or because of bad gamma parameter? http://tinyurl.com/cw3zm5

    • Share/Bookmark

  5. reading the flex data viz deve…

    03-19-2009 by dan

    reading the flex data viz developer’s guide: http://tinyurl.com/dy67vx

    • Share/Bookmark

  6. wishing i worked here: http://…

    by dan

    wishing i worked here: http://www.cloudera.com/

    • Share/Bookmark

  7. reading about integrating R an…

    by dan

    reading about integrating R and Hadoop: http://www.stat.purdue.edu/~sguha/rhipe/

    • Share/Bookmark

  8. Build Flare Project Using Flex SDK

    03-13-2009 by dan

    When I went to the Flare website and looked at the instructions for getting Flare set up, I thought they left quite a bit to be desired, particularly if you are not going to use the “pay to play” Flex Builder.  There must be a few people out there like me who want to use ANT to build their project in the Flex SDK.  I don’t think I could ever switch to Eclipse as my editor because I have become inextricably tied to Vim’s key commands.

    Here is a summary of the steps that I took to get the Flare library compiled, and then how to actually use that library in your own Flex project.  All from command line and without any Flex Builder involved.

    NOTE: You should have ANT installed and on your path. You should also know how to get to a command line interface. Typing “cmd” into the start->run box (in windows) will work, but cygwin is probably a better choice over the long run.  These instructions are for Windows, but I don’t see why this wouldn’t work in Linux, too.

    The directories and pathnames in parentheses are examples, feel free to use your own directory structure.

    • Download and unzip the Flex SDK. (c:/flex)
    • Download and unzip Flare. (c:/work/flare)
    • Open up the Flare build file (c:/work/flare/build.xml)
    • Find the line that defines the property “FLEX_HOME”, and change this line to point to your Flex SDK installation:

    <property name="FLEX_HOME" value="c:/flex" />

    • Change to the c:/work/flare directory and build flare:  type “ant clean all” on the command line
    • This should build with no errors. if the ant command is not found, you haven not installed ANT properly. if the ant can’t load “flexTasks.jar”, then your build.xml file is not pointing at the Flex directory properly.
    • Now, if you go to (c:/work/flare/build), you’ll see a number of files have been built. the flare library is “flare.swc”.
    • Now that you have built the flare library, you can use it in your own project.
    • First, create this directory structure for your project (called “testflare”):

    c:/work/testflare
    c:/work/lib/testflare

    • Move the file c:/work/flare/build/flare.swc that you just created into c:/work/testflare/lib
    • Let’s build the example app from the flare website
    • In part 3, scroll to the “constructing a visualization” section.
    • Copy this code into a file named (c:/work/testflare/testflare.as).
    • Open up testflare.as, and change this: public class Tutorial extends Sprite to this: public class testflare extends Sprite and this: public function Tutorial() to this: public function testflare()
    • Now, create a new file called (c:/work/testflare/build.xml).
    • Paste this into build.xml:
    <project name="testflare" default="compile" basedir=".">
    	<!-- don't forget that you need to tell ANT where Flex is! -->
        <property name="LOCALE" value="en_US"/>
        <property name="FLEX_HOME" value="C:/flex/"/>
        <taskdef resource="flexTasks.tasks"
            classpath="${FLEX_HOME}ant/lib/flexTasks.jar" /> 
    
    	<!-- these are paths defined for your testflare project -->
        <property name="build" location="build/"/>
        <property name="lib" location="lib/"/>
        <target name="init">
            <tstamp/>
            <mkdir dir="${build}"/>
        </target> 
    
    	<!-- this tells the compiler where to find your source file -->
    	<!-- and where to find the flare library -->
        <target name="compile" depends="init">
            <mxmlc file="testflare.as" output="${build}/test.swf">
                <compiler.library-path dir="${lib}">
    				<include name="flare.swc"/>
                </compiler.library-path>
            </mxmlc>
        </target> 
    
        <target name="all" depends="compile"/>
        <target name="clean">
            <delete dir="${build}"/>
        </target>
    </project>
    • Change to the directory (c:/work/testflare) and build the project: ant clean all
    • You should now have a working project that imports the flare library!
    • Share/Bookmark

  9. reading an introductory guide …

    by dan

    reading an introductory guide to R for psychological research: http://www.personality-project.org/r/

    • Share/Bookmark

  10. plugged twitter into facebook …

    03-12-2009 by dan

    plugged twitter into facebook and wordpress — talk about syndication!

    • Share/Bookmark