1. Internet Attention Span

    08-14-2009 by dan

    I was reading this article in Slate about how the internet encourages our innate obsessive “scanning” behavior, and it made me a bit nervous.

    I know for a fact that my attention span has withered to nothing over the past few years.  Even now, as I write this post, my attention is wandering from one thing to the next.  In fact, just starting this post was an exercise in attention wanderlust.

    Everyone has heard of the experiment where rats had electrodes in their brain that stimulated the “pleasure center”.  The rats kept pushing the button to stimulate themselves no matter what, forsaking food, sleep, etc.  This article claims that new research suggests the rats were not stimulating the “pleasure” part of the brain, but rather the “obsessive scanning” part.  It became more important to focus on the behavior of stimulating themselves than the stimulation itself.  And the analogy is that jumping around on the search engine is doing exactly the same thing to our brains — and permanently rewiring us in the process.

    The funniest part about this article was that it was LONG.  I had to fight really hard with myself to stay tuned into reading the article and actually finish it.  I kept having to remind myself how ironic it would be to stop paying attention to an article that was trying to explain to me why the internet is killing my attention span.  It actually was well worth the read, though, and if you have the self discipline I’d recommend making it the whole way through.

    I’m bothered a bit by the hypothesis that we only collect information on the internet because it appeals to some obsessive behavior.  Having ease of access to so much information certainly forces me to obsessively sift through it, but part of what is happening right now with social networking is that all this obsessing is actually beginning to organize some of that info, and making it less necessary that I obsessively scan through EVERYTHING.  I can now spend a bit less time scanning through stuff that my friends, family and colleagues have organized for me.  Of course, without all this convenience, we’d be scanning through a lot less information, but I think it’s very exciting to have all this stuff at our fingertips.

    Just because the availability of huge amounts of data is brand new doesn’t mean we won’t figure out how to process it better.  I imagine we’ll look back to this period of time and say, “wow, look how hard it was to aggregate information efficiently back then!  It’s amazing we got as far as we did given all the constraints that used to exist!”

    • Share/Bookmark

  2. Web Sharing Questions

    07-31-2009 by dan

    Why develop anything for the web if it is locked into your proprietary framework?  I’m not talking about your intranet, I’m talking about your publicly available (or maybe even subscription based) internet presence.  In this day and age, anything you create that is not nimble enough for “mashups” will actually give YOU trouble as well as your customers.  For example?  What if you wanted to do summary statistics and unit tests of your database right now?  Would your developers have to start coding like mad?  I know the ones on my database would (mostly because that would be me).

    Why would a nimble architecture help here?  Because you could consume the data in some ready made statistical software (R anyone?) and do some quick analysis without new custom queries or development work.  Same thing with unit tests — pull out specific sets of data with your agile framework, and test them with whatever software you like.

    Here are my questions about developing this sort of framework:

    • How to get your database browse-ready in a RESTful way?  What is the software involved?  Do you need special Apache plugins?
    • How to get applications you do choose to develop to be able to consume your data properly — Do they consume a feed?  Browse the RESTful setup?  Both?
    • If you build an app, how do you make it ultra-portable?  Youtube videos, google gadgets, site searches, etc are all easily plugged into a website, and if you provide the proper XML attributes, you can even reskin them so your clients can easily keep their own look and feel with your technology.  What is the technology involved here?  Is it simply a matter of bundling attributes in HTML/XML and source javascript, with access to your web API?  Or is there more?
    • Scalability!  Do any of the above questions impose a ridiculous amount of overhead on your database or webserver?

    These are questions I don’t yet have answers to.  I’ll write here as I find answers.

    • Share/Bookmark

  3. Web Services for Data

    07-30-2009 by dan

    The problem:

    I have a large data warehouse stored in an Oracle database.  The existing framework for “consuming” this information is very old and static.

    The question:

    Can I take this rich dataset and construct methods for extracting the data that will be extremely flexible, and run parallel to the existing static framework (so as to no break what is already there)?  What I am looking for is something like a web service, which accepts a basic set of parameters and returns some kind of data “object”.  It should be generalized so that I can ask for one data element, or a long list of data elements.

    Why do I want to do this?  To put it simply, it is too difficult to extract information from this database in the current form.   All custom queries must be constructed off-line, and require a large effort to get into the existing framework.  What I’d like is to supply a set of data “building blocks” that can be “mashed” together to create reports, summaries, unit tests, and new datafeeds that I hadn’t explicitly defined at the outset.  Even better would be the ability to also pull blocks from outside sources — like salesforce.com or google finance.

    So, I’m now researching the best way to provide the data “building blocks”.  I can see some of these individual blocks becoming quite large — someone might ask for a time series of daily transactions over the past 10 years — and so a “pull” architecture probably makes the most sense.  If I am pulling a big block of data out of the system, what format should I transmit it in?  Currently, most of the data is viewed in the tried-and-true table format of a database or spreadsheet.  Keeping it as CSV, then, would be logical, but I want this system to be flexible — it might provide a CSV by default, but it should also provide XML or JSON on request.  (Can I transfer in a compressed format, and decompress on the other end?)

    It should be obvious where I am going with all of this — if I can pull building blocks in whatever configuration I want, then I can insert them into whatever system I want — including the old static system.  Obviously, I’d like to completely supplant the old system, but I will have to build toward that.

    The other advantage to this approach will be more transparency about what is actually in the data catalog.  A side effect of a well designed pull framework will be a somewhat self-documenting catalog of what is actually available and how it was constructed.  This should also allow me to design natural unit tests for verifying the integrity of the data (and the framework).

    Is a web service the correct solution here?  It should be universally available to any device, permission-able, and as fast as possible.  The output from the framework should be completely separate from any service that consumes the data.  Do web services fit this mold?

    • Share/Bookmark

  4. Customer Service Problems

    07-28-2009 by dan

    Yesterday, I was trying to buy a plane ticket on jetblue.com using my browser of choice, Firefox 3.5.  I ran into a security problem just as I was about to buy the ticket — Firefox stopped the transaction and told me that “despite being connected via an encrypted channel, the information you are about to transmit will be sent unencrypted and will be easily seen by onlookers”.  I am paraphrasing Firefox, but that was the essence of the message.  So I stopped.  Hell no!  I’m not submitting my credit card information if my browser is about to send something unencrypted.

    I called customer service and had to fight through 3 representatives to get to a supervisor.  The supervisor was very reluctant to help me, but I eventually talked my way to an IT person.  The IT person wasn’t the right guy, either, but he at least suggested an alternative (and told me that jetblue only supports IE — really?  Your site is public!  You need to support ALL browsers!).  Here’s a summary that I sent to jetblue’s complaint form about what happened:

    Website Security Problem

    Hello.  I have booked flights on jetblue.com several times using the Firefox web browser.  Today, I tried to book a flight and got a security message — a very concerning one.

    I am using Firefox 3.5.

    I chose a flight, chose my seats, and entered all my information.  I got to the payment screen, and put in my Mastercard number and related info.  I clicked Submit.

    My browser stopped the transfer somewhere between your site and the site it was trying to contact, though, because some information was about to be sent on an UN-encrypted channel, despite the fact that I was connected to your site securely.

    I talked with customer service, and eventually even to a very helpful IT person at jetblue.com, who advised me to fill out this form.  No one was able to say anything about this problem, and no one had any information about WHAT was being transmitted from jetblue.com in an UN-encrypted fashion.  I was told to try Internet Explorer to fix my problem.

    I used Internet Explorer, and the problem did indeed disappear.  HOWEVER, the larger issue is this — what is your site transmitting that is not encypted?  Did Internet Explorer not check for this, and Firefox caught it?  Or is it simply a bug with viewing the site with Firefox?

    I think this is a pretty urgent matter, because if you are transmitting credit card numbers on un-encrypted channels, you could be providing phishers with personal data.

    I really think that the application was making a mistake here.  Something that shouldn’t have been sent unencrypted was being sent unencrypted (I fully concede that it may NOT have been my credit card information, but that’s not a risk I’m willing to take!).

    The challenge is that customer service was NOT the right set of people to hear about this problem, I should have talked to someone on the website team.  Nowhere on the jetblue site is there any “technical problem” hotline, though, so the customer service people all told me to ignore it, since “our site is very secure”.    The site may use very secure protocols, but you can still have a bug in the system that allows communication via an unsecured connection, which renders your “very secure site” very un-secure.  I hope that someone is on the other end of the complaint form, and will forward my message along to the web team — could be bad news for jetblue customers otherwise!

    • Share/Bookmark

  5. More on Missing Pipes

    07-22-2009 by dan

    The following is a comment I left on a post at Jon Udell’s blog about “rewiring the web“.  It outlines some of the ideas I have mentioned here, and one of the commenters mentioned that Microsoft, of all places, actually had done something similar to what I suggested.

    I love this wiring the web idea, but I’m getting concerned about where the wires themselves are stored.

    What if your dopplr or tripit or yahoo pipes or whatever you are using goes away unexpectedly?  This could break a lot of stuff you have built.

    It might take forever to restore things to a working state, particularly if you relied heavily on one service.

    Here’s what I think might solve this problem:
    You log in to yahoo pipes and design a filter that takes feed A and creates feed B.

    Rather than the filter being stored at yahoo, the pipes create a small object that you can add to your website (or wherever) that performs that functionality.  All that is needed now is for feed A to continue to exist.

    Now, yahoo pipes suddenly disappears, to be replaced by google hoses.  You still have that one piece of functionality you created with the pipes, which still works since you didn’t store it at yahoo.

    This would give you time to switch to google hoses for new filters, while old filters continued to work (and might even be able to be imported into google hoses for future editing)

    So the filters get stored with your data — after all, you spent time creating them, so they are sort of a type of meta-data, right?

    I think this is something that should be seriously considered.  Pipelines (or wiring, or street networks, etc.  Choose your metaphor) are laid with permanence in mind.  I don’t think the metaphor should break just because we are talking about digital connections rather than physical ones.

    • Share/Bookmark

  6. Brokers Change Circumstances

    07-20-2009 by dan

    I follow a bunch of blogs with the Google Reader, and a bunch of handcrafted feeds using the Yahoo Pipes.  One of the most interesting things that happens with this setup is when one blog frames an idea, and another blog/news source reveals that idea in action.

    This morning, I read a great post by Seth Godin about folks “doing their best” given the circumstances (great Godfather reference, by the way!).  Not a minute later, an article from the NYT came in about Mortgage Brokers in southern California re-purposing themselves as loan modification experts, and the parallel between Mr. Godin’s post and the real world was immediately obvious.   From the NYT article:

    Mr. Soussana’s partners at FedMod, as the company is known, were also products of the formerly lucrative world of high-risk lending.

    The immediate reaction is to be horrified that people who could sell such obviously flawed products as Subprime Mortgages in southern California could still be in business, and in fact making MORE money from the very people they sold the products to.

    But, applying Mr. Godin’s ideas to this article, I start to see that the circumstances have indeed changed, and these brokers are doing “the best they can” given these new circumstances.  It’s definitely an unfair world that presents them such money making opportunities twice in a row, especially when they appear to take advantage of customers whenever possible:

    Despite making promises of relief to homeowners desperate to keep their homes, FedMod and other profit making loan modification firms often fail to deliver, according to a New York Times investigation based on interviews with scores of former employees and customers, more than 650 complaints filed with the Better Business Bureau, and documents filed by the Federal Trade Commission in a lawsuit against the company.

    So, you be the judge:  cheaters continuing to cheat, or folks adjusting their strategy given changing circumstances.  (or, I suppose, both?)

    • Share/Bookmark

  7. The Tangled Chain Someone Else Weaves (and then YOU have to undo)

    07-16-2009 by dan

    Wow, now that I have written this, it’s a lot more ranting and angry than I had intended…  I guess that says something about how I feel about it!  Prepare yourself!

    Eric just responded to my Crap Filing Cabinet post over at his blog.  As he was painstakingly dissecting my post, I got to thinking about one of the points he made about attaching files to emails:

    …the immediacy of ‘attaching’ just makes it too appealing. Someone is ‘dumping’ the task off on you with the minimum effort necessary…

    I actually think that his point here is broader than he lets on.  Email is a fantastic way to dump work off on other people.  But not in that “forward customer service request to my co-worker” way that immediately jumps to mind.

    Consider an email like this one:

    From: Person5
    To: Dan Dube
    Subject: FW: RE: FW: FW: Question

    Dan,

    Check this out and let me know what you think.

    P5

    From: Person10
    To: Person5
    Subject: RE: FW: FW: Question

    P5,

    Hey can you get Dan to look at this?  I’m sort of stumped.  I added a bunch of stuff the the script, though, so that should help.

    P4

    Office: (555) 666-6666
    P4@company.com
    www.mystuff.com — my blog!

    From: Person3
    To: Person4, Person10, Person11
    Subject: FW: FW: Question

    P4,

    I just looked through the database, it looks like an inner join isn’t working properly, so we are getting a bad match here.  Can you figure out where the source data was coming from, and why the join failed?

    Thanks!
    P3


    ANYTHING SENT TO THIS EMAIL ADDRESS IS CONFIDENTIAL!  IF YOU AREN’T THE INTENDED RECIPIENT OF THIS EMAIL, DELETE IT IMMEDIATELY AND GO WASH YOUR EYES OUT WITH SOAP!  OH MY GOD STOP READING HERE!  I WILL SUE YOU!  YOU KNOW I WILL

    From: Person2
    To: Person3
    Suject: FW: Question

    P3:
    Just got this in, can you take a look?

    Thanks!
    P2


    Without Love there is no War
    Person 2
    (555) 555-5555 (office)
    (555) 555-5556 (cell)
    www.p2.com (website)

    From: Person1
    To: Person2
    Subject: Question

    Hello, I was looking at your website and noticed that Item X shows a price of 5 dollars, but it seems like it should really only be 5 cents.

    This actually wasn’t as difficult to type up as you might expect, since I literally receive this email 20 times a day and have to go through this process 20 times a day.

    How is the work being offloaded here?  Let’s look at the completely misguided ways:

    • First, I have to look back through this entire chain to figure out what the hell it is about.  The email has conveniently been ordered so that the most relevant stuff is at the bottom, where I have to waste as much time as possible getting to it.
    • Second, Person2 did the classical “give it to someone else but don’t help at all” approach.  This guy is probably a manager.  Notice all the crap text in his footer that is now mucking up the email chain.
    • Person3 figured some stuff out, but any logic that he used is lost in the sands of email mess.  To make matters worse, he has a big privacy notice that adds garble to the mix.
    • Person3 is a bit too thorough, though, and forwarded to too many people.   Look at the subject of this email grow!  (to be fair, most systems don’t do this anymore, but I’m just trying to emphasize how lousy the subject is)
    • Person10 makes some more progress, but again, his logic is lost, and now when this finally gets back to me I will have to contend with the extra crap this guy added in.  Now the subject contains a helpful RE in addition to several FWs.
    • Now, my boss has a hold of this thing and he forwards it along again.  Instead of giving me a quick overview of what I should be looking for, I only get a “have a look” message.

    Wow.  There are all kinds of problems here.  Some smart people were doing work here, and their work is lost to the un-collaborative mess of this email.  The only way I’d be able to use what they did is to go find them and talk to them face to face, which is hard to do when you work from home.

    There isn’t any revision control of the email itself, so I have to go with my gut that the lowest thing on the chain was indeed the original email and hasn’t been cut by someone’s truncating email client or spastic copy-n-paste hand movements.

    Each of the people on the chain had to waste time navigating the chain merely because everyone else was too lazy to summarize what had happened up to that point.

    Finally, what if the first person had sent a file?  Imagine how many forked versions of it would have been passed around as people downloaded their own copy, made their own edits and then forwarded it along?  Especially when it starts going to multiple people, you can see how bad the problem can become.  And, to make matters worse, the email server is storing multiple copies of this crap, instead of one version controlled archive of the edits.

    This is awful.  Email and the filing cabinet metaphor must be discarded now.

    • Share/Bookmark

  8. Two Dimensional Data is Crap

    by dan

    How’s this for a broken process:

    • Many data tables, stored in Oracle.  Very rich set of data, infinitely combinable as a relational database.
    • Skilled data analyst, able to create whatever report you need.  (that’s me)
    • Reasonable process for automating aforementioned reports

    But here’s the broken part — I can get this data to be any format I want (Excel, XML, Access, CSV, whatever) out of this with some work, but why would I want to?  The data is most usable in the raw format! And, it isn’t restricted to being 2 dimensional.

    The broken part is less the dimension of the data, though, and more how I am supposed to give it to a client.  Why do I have to extract it from my system (breaking any and all linkages within my system) to give it to them?  Why do I have to choose a new format for perfectly workable data?  I understand that there may be some intermediate form to TRANSMIT the data to a client, but they should be able to look at it in whatever tool they like, without any loss of generality from the original format.  If they like two dimensional data, then they should be able to look at it two-dimensionally, with some arbitrary rule applied when they encounter a field that has many matches.

    Most of my clients will ask for the report in an Excel readable format, because Excel is installed on everyone’s computer, everyone has some experience with Excel, and Excel is easy.  This is where the filing cabinet metaphor (with a lot of help from Microsoft) has really screwed up progress.  Everyone wants a “spreadsheet” with the data in it.  A spreadsheet — you know, like the graph paper that I used to use before computers were invented.  I don’t think that there is anything wrong with spreadsheets, but they are 2 dimensional, add all the problems with the filing cabinet metaphor I hate so much, and add more time-wasting work for me to produce.

    Let’s get this straight, though — the customer is not wrong here.  They have been trained to use software in a particular way, and I don’t want to change that.  I do want to change the underlying model of the data, though — I want to store the thing online (like Google Docs, Twitter, my Blog data, etc), and I want them to access a VIEW of it through Excel.  Then, when other, better (meaning, able to handle the multidimensional fields), tools come along, they can view it with that instead.

    • Share/Bookmark

  9. Twitter = Napster?

    07-14-2009 by dan

    I have a Twitter account (which you probably have noticed if you came here, I cross post all kinds of stuff to it).  I’ve had a Twitter account for a while, but only recently have I actually started to see some of the benefits of using it.  So far, though, the benefits seem very one-sided — to my benefit.

    Twitter’s main use (in my mind, thus far) is for real time search.  You can try the Yahoo Pipes approach I talked about before, or you can use an even closer to real time approach with the TweetDeck, if you don’t mind another app open on your desktop/phone/whatever.

    But, if all you use Twitter for is to get information, you won’t ever add any value back to the thing, will you?  I remember back in 1999 I used to log on to Napster and download a few songs.  I never shared my songs back with the network, though, because the common thought was that sharing songs was illegal, but you could probably get away with downloading them.

    No special interest group is being beaten to death with Twitter, though, so I don’t think they will go away in a puff of smoke like Napster did.  However, what would have happened to Napster if it hadn’t been killed, but the one-sided use of it continued?  Would we have just ended up with a few big sharers and a million leaches, like FM radio?  Those few sharers would have had quite an influence over what was available for download, and what was “new” for download.

    Which brings me back to Twitter.  If there are a whole bunch of people doing search-only on Twitter, and not contributing back to it, will there eventually be just a few Twitterers that really matter?  Look how many people follow Ashton Kutcher

    • Share/Bookmark

  10. Careful With Web 2.0 Tools!

    07-09-2009 by dan

    I was ruminating on my last post about the Yahoo pipes, and I got to thinking about an article that Jon Udell wrote a few days ago about FuseCal.  This is something to think about carefully, especially when you build something that will be used by others or for commercial interests.

    In Mr. Udell’s example, FuseCal was taking webpages that were not in calendar-feed format and using FuseCal to create a feed from them.  This is really cool, but when FuseCal ceases to exist, what do you do then?  Imagine if you had built a business on converting web calendars to calendar-feeds using FuseCal?  That’s an extreme example, but the same thing could easily happen to Yahoo Pipes, Google Maps or other services.  I’d lose a few interesting blogs if the Pipes went down, and a lot of real estate sites would be SOL if Google maps died.

    It always comes back to the idea that you have to build it yourself (or at least use an open source version) on your own server if you want to guarantee it will be available to you.  I think this is somewhat unfortunate, because it encourages us to rebuild wheels.  Even if you use an open source version of some software, you’ll still have to get someone who understands how to keep it up to date and  what the implications of updates are.

    Incidentally, there is a library called WireIt, built on the YUI library from Yahoo, that gives you an implementation of the Yahoo Pipes interface.  What happens when YUI goes to version 3.0?  Does WireIt break?  Do you need to upgrade?   These are the kinds of decisions you have to keep in mind, even if you are using new Web 2.0 “mashable” stuff.  I suspect that we will start to see a lot of Web 2.0 stuff breaking as the technology evolves, and no one had a contingency plan in place.

    This is probably a problem that has already been visited before with regular software, though, as I’m sure Microsoft or Google can attest.

    • Share/Bookmark