DATA for A Twenty-Third

May 15th, 2012

This post relates to the class Data, thought by Mark Hansen in Spring 2012. It uses code and advice received from him.

A Twenty-Third is a data-based installation dedicated to the memory of cyclists who have died riding the streets of NYC. It recreates the tension of riding and the disparity experienced in an accident.
More information about the installation here

A Twenty-Third used as source a massive, not very organized datased called CrashStat. It was published by Transportation Alternatives, who managed to obtaine them via multiple FOIL (New York State Freedom of Information Law) requests to the New York State Department of Transportation (NYS DOT), and then organized, unified and made public the data in a single set covering from 1995 to 2009.
I managed to get from them a not-yet-processed dataset containing the raw data from 2010, so I also had to unify both datasets at some point.

dim(bikes_1995-2009)
53399 obs. of 61 variables
dim(bikes_2010)
3347 obs. of 59 variables

My installation was only related to cyclist accidents, so the first step was to filter out the accidents that did not have a cyclist involved.

bikes = bikes[bikes$ACCD_TYP==3,]

When I started analysing the first dataset, I run into many problems where the dates had to be converted, the codifications weren’t numerical, and also there were a lot of missing cells.
I converted the time and date data to time objects:

bikes$time = strptime(bikes$ACCD_TME,"%m/%d/%Y %H:%M:%S %p")
bike$date = strptime(bikes$ACCD_DTE,"%m/%d/%Y")

Then, because there were many rows including only date and not time, i randomized these to unify all time data into one row.
a <- which(is.na(test$time)) bikes$time[is.na(bikes$time)] = bikes$date[is.na(bikes$time)] + sample(86400, size=length(a)) Secondly, I needed to classify them according to the seriousness of the accidents. One of the columns included information about the type and extension of the injuries, and I used the following code to convert it into a more simple classification. 1 being a non-serious accident, 2 a serious accident and 3 a fatality.

bikes$seriousness = rep(1,nrow(bikes)) bikes$seriousness[grepl("[A2345]+",bikes$EXT_OF_INJURIES)] = 2 bikes$seriousness[grepl("[1K]+",bikes$EXT_OF_INJURIES)] = 3

Going through the data I was able to make some preliminary observations:

table(bikes$seriousness)
2     3
22139   279

there are no records of non-serious accidents. They were either filtered out from the original database, not recorded, or just not reported.
the overall number of serious accidents decreased from 1995 to 2007, slightly increasing since then. this could respond to the increase in the biking population and addition of bike lanes to the system. It would be interesting to match data of the biking population evolution with this dataset.
there is a clear increase in the accidents every summer and it goes to near zero in the winter.
the number of fatalities doesn’t follow a visible pattern. One can see peaks in 1999, 2005 and 2007, but this could be interpreted as noise given the small number of fatalities (~13-28 per year)

I tested correlation between every column i could in the dataset, but I couldn’t find anything very surprising:

names(bikes_19952009)
[1] "OBJECTID"           "CASE_NUM"           "CASE_YR"
[4] "REF_MRKR"           "ACCD_DTE"           "ROAD_SYS"
[7] "NUM_OF_FATALITIES"  "NUM_OF_INJURIES"    "REPORTABLE"
[10] "POLICE_DEPT"        "INTERSECT_NUM"      "MUNI"
[13] "PRECINCT"           "NUM_OF_VEH"         "ACCD_TYP"
[16] "LOCN"               "TRAF_CNTL"          "LIGHT_COND"
[19] "WEATHER"            "ROAD_CHAR"          "ROAD_SURF_COND"
[22] "COLLISION_TYP"      "PED_LOC"            "PED_ACTN"
[25] "EXT_OF_INJURIES"    "REGN_CNTY_CDE"      "LOW_NODE"
[28] "HIGH_NODE"          "ACCD_TME"           "RPT_AGCY"
[31] "DMV_ACCD_CLSF"      "ERR_CDE"            "COMM_VEH_ACC_IND"
[34] "INTERSECT_IND"      "UTM_NORTHING"       "UTM_EASTING"
[37] "GEO_SEGMENT_ID"     "GEO_NODE_ID"        "GEO_NODE_DISTANCE"
[40] "GEO_NODE_DIRECTION" "GEO_LCODE"          "HIGHWAY_IND"
[43] "CASE_NUM_YR"        "X_COORD"            "Y_COORD"
[46] "BoroName"           "BoroCD"             "StSenDist"
[49] "SchoolDist"         "CounDist"           "CongDist"
[52] "ElectDist"          "PjAreaName"         "Precinct_1"
[55] "GEOID10"            "NAME10"             "DPHO"
[58] "AssemDist"          "time"               "date"
[61] "seriousness"

the most part of the accidents happen in biking-favorable conditions: sunny days, daylight, well lit streets. some subset would have to be made to find patterns inside the less common scenarios.

finally, as the installation was going to run at a sped up rate, in order to determine the actual speed, and the considerations I’d have to take for it I had to study the differences in time between each accident, and more importantly, between each fatality.

plot(diff(bikes$time))

plot(diff(bikes$time[bikes$seriousness == 3]))

quantile(diff(bikes$time[bikes$seriousness == 3]), seq(0,1,.05))
Time differences in hours
0%       5%        10%       15%       20%       25%
1.183333   18.145750   37.251667   68.850000   93.000000  113.000000
30%        35%        40%        45%        50%        55%
149.000000  194.571667  233.093333  282.300000  314.450000  378.915833
60%        65%        70%        75%        80%        85%
416.900000  488.195833  559.200000  642.145833  760.080000  922.090000
90%         95%         100%
1140.920000 1515.385000 4096.050000

In the case of the fatalities, there are about 15% of them that occur within 3 days from each other. The installation was run at a rate of one day per minute, so there were that many cases where the reenactment had to be delayed. For the next time, I would like to explore a different approach that solves this conflict better while maintaining an interesting experience for the viewer. The data could be replayed at a slower rate, or these cases could be reenacted together.

Finally, I used R to export a subset of the data, including only the information I needed for the installation. This operation was fairly easy using subset, rbind and write.table.

bikes.a &lt;- subset(bikes_19952009, seriousness >=2, select = c(seriousness, date,time))
bikes.b &lt;- subset(bikes2010, seriousness >=2, select = c(seriousness, date,time))
bikes &lt;- rbind(bikes.a, bikes.b)
bikes$timestring = as.numeric(bikes$time)
write.table(subset(bikes, select = c(seriousness,timestring)),'~/bikes.csv', quote=FALSE, sep=',', row.names=FALSE)

There are lots of other things I’d love to do with the data, but my lack of R expertise made it very difficult. I wanted to apply more of the concepts we learned in the Data class, and check them with the many other columns I just overlooked.
R has proven very useful. As any other language, one has to go though a learning curve to become fluent, something I underestimate and didn’t have enough time to go through. I hope I can advocate more time to it in the short future.
Something I hope to do in the near future, is to use some of the learned concepts we learned to improve another project, BKME, that is basically a platform that aggregates user generated data of cars illegally parking in bikelanes. In order to make that data useful and create change, we need to be able to analyse it to find clusters and patterns.

#BKME App take 3: geolocation, better app flow, flags.

March 1st, 2012

In the last few days I’ve been working on polishing the interaction and finding mechanisms to check statuses and inform the users.
The idea here is for the app to do as much as possible by itself, but still keeping the user informed of the actions that are being taken.
Read the rest of this entry »

#BKME App take 2: buttons, feedback, camera fails

February 27th, 2012

For the 0.0.2 version of the app, I worked on organizing the code, using the camera, and user feedback. I got many things working but I got stuck on trying to solve a bug with the camera (visible on the demo).

Read the rest of this entry »

#BKME App first über basic demo

February 16th, 2012

So this is the very first approach to the app.
I spent most of the time organizing the file structure and the basic layouts.
Also it took me some time to get familiar with both the phonegap and the jquery mobile APIs.
For now, the app only triggers the camera, and loads some status messages, but something is something.
Read the rest of this entry »

#BKME App UI flow.

February 9th, 2012

So this is a first approach at the interfase for the new bkme app.

Overall, the goals is to make the time from the unlocking to the locking back to be as short as possible, and ensure the photo will be propperly added to the database even after the phone has been locked.

I created a simple interfase that shows the camera filling the whole screen, and with banners that come from above to notify of status of the different parts.

Constraints:
App should start get geolocation on load or wake-up. messages appear from the top
Camera should load as soon as it’s ready, while geolocation request continues until done.
after photo is taken, there are 3 options: send (primary), cancel or retake.
compression and upload should happen in the background, after the window is released for a new photo.
This should also apply for when the phone locks.

#BKME App, first approach

February 1st, 2012

Last semester with my friends Alex Kozovski and Fred Truman we designed a platform for defending the bikelanes from cars parked on them. The plattform sits on top of twitter, allowing anyone with an account to send a picture with the hashtag #bkme, and a robot catches it and stores the geolocation and the photo in a database, organizing all the isolates cases in a structured dataset that can be used for changing urban design, policies or enforcement in the needed areas.

System is great, and most of the people we have told about it find it brilliant, but that enthusiasm hasn’t transferred to actual usage as we expected. One of the biggest limitations we have found for people to use it regularly is the amount of time it takes for a person to send it through a twitter client. 30 seconds may not seem a lot of time, but for a biker, having to stop for that long is too much to ask.

So I decided I’ll use this class to come up with an app that makes this interaction as fast as possible, leaving little to no options for the user to take, and let the whole process of uploading and  in the background.

It’s probably not going to be a final app, but it’s a good way of starting and prototyping the interaction.

Hello, Phone Gap!

February 1st, 2012

This is the first of a series of posts related to the short class Mobile Web, where I’ll try to develop a mobile app for #BKME, the project I’ve been working on with Alex Kozovski and Fred Truman since last year.

We just started exploring the cross-platform mobile development framework Phone Gap, which will be the core of the class, and it’s looking very powerful. It is basically a two side package that has a javascript library with functions to talk to the native platform in one side, and a native library talking back to javascript. In that way one can develop an app just using html and javascript, languages that are more easy to learn and more broadly known, making the learning curve much faster.

Installing phone gap and eclipse -for Andrioid Apps- is not obvious, but with a little work you can make it work.

Also, because we’ll work with an older version of Android, it is necessary to modify the manifest, because that version doesn’t recognize the android:xlargeScreens parameter (fixed by just deleting that line).

Done that, it’s just html, that lives in assets.

Type of violence in pirate attacks around the world

October 4th, 2011

Source code here

Using data from Guardian Data Blog

People Standing in a Corner

September 29th, 2011

P1010023

I chose the corner of Bedford and North 5 St. in Williamsburg because it has good traffic coming from the subway stop Bedford L and also have a high portion of the urban front occupied with shops. Additionally, it has a nice cafe in the corner that allowed me to sit and have a coffee while taking notes and pictures. I stood there approximately between 6:45pm and 8:00pm on a Tuesday.
Read the rest of this entry »

Field trip

September 22nd, 2011

Graham Ave.

I chose the first block of Graham Ave, a prominently commercial street Brooklyn as the target of my networked street observation trip. Is a place I walk through frequently and seemed dense enough to find interesting artifacts. After the first walk around the block I was very disappointed because I could only find a couple of surveillance cameras and a few other sensors, making me think I should move to a different block, or neighborhood. But as I already had printed the map of the area, I decided to give it a second shot. To my surprise, i started finding more and more, so I decided to stay.
Read the rest of this entry »