Introduction to Computer Vision with Go I: Obnoxious Webcams

Earlier this week I spent a morning listening to Ron Evan’s introduction to computer vision with Go course through Safari. Following my belief that every course should should have at least one project, I hacked up an application to deface images and webcam streams with hats: Lotsohats.

The course walked through several GoCV example applications to apply filters to images and use deep-learning packages with pre-trained models for image classification. To me, it seemed like an obvious easy assignment was mimic a certain alcohol-advertising crowd-cam app used at the local hockey arena and drop hats on people in images.

 Example image defaced by  Lotsohats .

Example image defaced by Lotsohats.

It was my first experience with Go, but the proof of concept took a morning to hack something together, learning Go unit testing and exploring the GoCV and Standard Go Library APIs along the way, and this morning was spent polishing, tweaking, and documenting. The Donovan and Kernighan Go book is also a good resource.

What Dat?

Scraping the web for cheap transfer of database content to web pages, I got stuck on the number of products with the name WebDB, many essentially running something like SQLite backed by web local storage.... one of them drives Beaker Browser, a peer-to-peer web browser using DAT built upon Electron.  The rabbit hole along my search to find a corporate IT alternative surfaced into a hacker community.  So, what are DAT and peer-to-peer browsing?

Share.... FTP or BitTorrent ish

Dat is a peer-to-peer file-sharing protocol that builds a distributed directory to publically readable filesystems. There is a Node command-line implementation to share and clone directories and a NPM package for programatic access.  Dat is similar to BitTorrent, but it's lighter-weight, since Dat clients can choose to download and share portions of filesystems.  The requirements are so minimal that Dat can even run from my Chromebook after a minor configuration twiddle and installation of python2, gcc, et al.... via pkg from Termux.

The Dat Project has a tutorial to install Dat and install files.  The TLDR; is

  • use npm to install dat (repeatedly until you satisfy all the gyp requirements),

  • run dat create to share a directory,

  • populate the directory with your stuff,

  • and run dat clone to access the directory elsewhere.

If you want your files to persist beyond your terminal session, HashBase will host your data, the first 100MB is free.  There's a Dat podcast, DAT;Cast, with Dat enthusiasm and events from January 2018 onward.

Interface.... _Browser_

Dat ships with a web server, just run "dat sync --http".  More generally, running dat-gateway will bridge content to your browswer, but now we're back to server land.  Beaker Browser merges a Dat node with a web-browser.

What makes this a happy marriage? Along with providing basic editing tools, Beaker provides Dat to its rendered web applications.  Without Beaker, a web app's I/O is limited to HTTP or Websockets, which require a third-party adaptor (e.g. dat-gateway).  I've failed at using a couple of Websocket-based approaches, much of the browser support is out of date, so I'd advise starting with Beaker or dat-node.

Sadly, Beaker won't run on most Chromebooks anytime soon.  Beaker is built on Electron, which runs poorly on Google's high-end PixelBook (only in development mode?), but not on lesser Chromebooks, and the Electron team turns away requests to accommodate Chrome OS.  So this next step requires a lap or desktop computer.....

How Dat.... A Sample P2P App

Here's a simple peer-to-peer application to support consensus: Cahoots!  The basic idea is that each voting-bloc member controls and publishes her own ballot containing her opinion, notes, and optionally references to other bloc members she has invited.  Each bloc member can traverse the web of references starting from a master or config ballot to listen for ballot updates and count the common opinions.

{ "displayName":
  "participants": [
  "notes": [
    [ 1531859696122, "I don't like cold soup"]

Entrance to the election is controlled through the write permission fo the config ballot.  Other ballots can contain additional invites, but we (each member, even) can choose to limit the entrance by the distance from the config.

 Cahoots in action.

Cahoots in action.

The Maritimes

The lighthouse at Peggy's Cove.

On the theme of traveling with M&J, upon return of my Magdeburg trip, S and I travelled to the Canadian Maritimes with another M&J couple for a vacation.  TLDR/W: Nova Scotia is oddly like Minnesota, with better accents and a bigger lake, and Prince Edward Island was idyllic, if cold. 


I'm back from a quick trip to Germany.  M&J had to visit an ophthalmologist in Magdeburg, and the destination seemed like a stretch for them.  Worse, yet, there's no commercial airport in Magdeburg, so we sequenced planes, trains, and automobiles from Arizona to erstwhile East Germany.  The response I've learned for the question of "Where is Magdeburg?" is that is the birthplace of Georg Telemann, about a 90-minute train ride west of Berlin.

M&J await their train at Berlin Hauptbanhof station.

Adding to the level of travel difficulty, M booked our stay with Airbnb in Neustadt.  In the flurry of emails and confusion preceding departure, we misplaced the check-in instructions. Adding insult to injury, we struggled with German telephone numbers, an unfamiliar American iPhone, spotty internet, forgotten passwords, unseasonal heat, and cranky taxi drivers, so after 18 hours of travel and another 2 waiting outside our apartment, we gave up. Luckily, S saved the day in a pinch, booking our evening at the Maritim downtown, so that we could gather ourselves and leverage reliable internet to message our host.

Magdeburg Cathedral, near our first few night's lodging.

Magdeburg is a lovely town.  There are lots of restaurants downtown and a wide variety of architectural styles.  There are walking and biking paths up the river providing views of the city's parks and cathedral.

One restaurant worth mentioning is Hyaku Mizu in the Pink Palace.  Mildly, M&J can be difficult customers, frequently ordering off the menu, always making special requests.  Our inability to speak German and scarcity of English-language proficiency in Saxony-Anholt, added additional hurdles.  The Hyaku Mizu staff, however, rolled with the punches.  For example, when M asked if the curried tofu could be softer, our waitress accommodated, directing the chef to substitute tofu from the miso for wok-fried....

We spent the next few days settling in, learning how to use the Magdeburg tram system, exploring grocery stores, and shopping for transit services.  The latter stemmed from my worries that M&J could not manage to get their bags to the train station, to the bus, to the airport.... so I spent a few hours scouring the Internet for reviews and visiting travel agencies before settling on a booking from Sachsen-Anhalt Tours, where Pfau graciously arranged for someone to take M&J to Tegel Airport.


The Berlin Cathedral and I crossed paths during my wandering of the city prior to my flight out.

I left to overnight in Berlinthe afternoon before my flight home.  Being my father's son, I'm always a bit nervous getting to the airport and would rather have a short morning trip than a trek to the airport....

I stayed in Alexanderplatz, which looked like the Times Square equivalent to East Berlin.  Everything there is big -- the buildings, restaurants, even the radio tower which is still the second tallest structure in the EU.  I headed over to Checkpoint Charlie to see the murals and wall remains.  On the way back I passed by lovely cathedrals and concert halls in the dusk, ending the night at Hofbrau Wirtshaus for some brew and the grand bierhaus experience.

File Editing, Chromebooks, and Development

Continuing on a tilt from my previous post on developing from a Chromebook, there are still lots of rough edges, even for developing toy projects, primarily stemming from file-system access.  A simple rule has kept me out the weeds: for Node.js projects develop only from Android apps using Android app-private storage.

The File System

The Termux documentation partitions storage into three categories: private-app, shared-internal, and external.  You won't see the shared-internal storage from Termux until you run termux-setup-storage.

It appears that under ChromeOS, all Android apps have the same file permissions, which will allow you to edit source files using vim to be read by Node.js, for example.  There are, however, lots of operations from Android that fail with shared-internal storage. Creating symbolic links (ln) and updating permissions fail (chmod).  Running npm from the shared-internal storage results in lots of errors of the form "Error: EPERM: operation not permitted..."

The default ChromeOS document root used for file selection runs on top of the Storage Access Framework, but the default root only provides access to video, music, and SD-card, not to the private Android emulator storage.  Here lies the obstacle between using Chrome apps to build executables for Termux or Crouton.


Given the rift between Android app-private and shared storage, I find it's easiest to edit files using Android apps.  There are several editors available within a few keystrokes of Termux, e.g. vim,  though I wish I knew how to have multple Termux windows open at once.

A work around to multiple windows is to use another Android app as an editor.  Turbo Editor can access Termux private data once you enable the storage-access-framework option under settings.  It provides syntax highlighting for HTML and Javascript files.  Under my installation, it periodically pauses, sometimes forgetting to remove the dialog box announcing the pause.

Other Directions

I've been scurrying about today looking for alternative methods of editing source code and general file-system access from Chromebooks:

Developing and Serving React Apps from Chromebooks

Yesterday I started to collect some notes on setting up React, Typescript, Babel, Webpack, Karma.... Holy cow! How many tools do you need just to get hello-react running?  Fortunately, a web search short circuited most of that when the Interweb powers showed me Facebook's create-react-app script "bundled" with recent versions of npm.

A couple of command-line entries later, node was scanning a directory, transpiling Typescript, running tslint and unit tests, and serving up the results.  It was so easy, you must be able to do this on a Chromebook....

There aren't many steps here.  First install Termux, nominally an Android terminal emulator, but green-circle gateway to Linux from ChromeOS.  We'll briefly use its package manager to bootstrap into the npm world.

pkg install git    # you have to save your work
pkg install nodejs
npx create-react-app lookout-unicorns
cd lookout-unicorns
npm start

Now point your browser to http://localhost:3000.  Chrome, surprisingly doesn't recognize the URL, but Firefox for Android running on ChromeOS does.

If you want to use Chrome to view your fruits....  ChromeOS assigns the Android emulator a private IP address which you can get from ifconfig in Termux or Crouton.  This won't be the same IP address assigned to your Chromebook by the network, but oddly ChromeOS will expose port 3000 to the world on your Chromebook's IP address to the rest of the network, just not to its own Chrome browser....

Vacation Pictures from Italy

Well we're back from Italy, and if you want to see where S and I took her parents, head on over to this Facebook album, or to this other one to complete the in-law travel extravaganza, depending upon which branch of the family you want to see.

Being B&L's first trip to Italy, we focused on the sites perhaps considered boring by seasoned travelers, but expected by anyone claiming to have visited Italy: Rome, Venice, and Florence.  We braved the throngs of tourists we've always tried to avoid, but as a result a lot of it was new, even to S.

Piazza Navona

We started with 3 days in Rome, staying near Piazza Navona at Hotel Raphael.  The location is a great launch pad for walking trips, and there's a lovely rooftop bar to return to.  In April, the rooms didn't suffer from the usual Italian urban din.  We loved the breakfasts, served with a myriad of buffet-style options including lox, omelettes cooked to order, mechanically fresh-squeezed juice (blood orange one day), breads, standard English faire.....

The crowds were incredible, but we lucked out in getting guided tours of the Colloseum, Forum, and Palatine Hill one day and the Vatican Museum, Saint Peter's Basilica, and the Sistine Chapel, both from City WondersJack-ass tip? Though there are still crowds in April, the tours don't fill up.  Your tour guide will give you a discount code to book future tours.  Book one ahead of time, and wait until the evening after to book subsequent tours.  Really, the tours weren't that expensive and the benefit of skipping the line and providing interpretation are invaluable.  Sites usually have text interpretation, but the plaques are usually obscured by several dozen oblivious tourists listening to their own guide.

Away from the tours, we saw the Trevi Fountain, the Spanish Steps, and the Pantheon.  The crowds were horrendous, even in April, so have a back-up plan to drop by during the evening or early morning.  One nice surprise taking a detour to avoid a crowd was Fontanella Borghese Market where we bought prints of Roman sites.  Make sure to avoid your hotel restaurant and ask for restaurant suggestions, avoid the schills, and kill your jet lag at the bar (with espresso! :)....

The sight that surprised me the most was Galleria Borghese.  I love baroque music, but never paid much mind to sculpture.  Who knew naked marble bodies could be so beautiful.  In my backwoods opinion, there are several Borghese items that put Michelangelo to shame... I guess that's what a couple hundred years of study will do.  The cherry on top has to be the ceiling frescos above the sculptures.  Jack-ass tip?: You have to book your visit for a 2-hour block ahead of time.  There'll will be a big crowd rushing to go in.  Avoid the crowd, spend 45 minutes upstairs to look at the paintings, and then return downstairs to the now relatively empty sculpture galleries.

A fiercer David than Florence's

We took the train next to Venice.  The rap on Italian trains is not well-deserved.  Our train was fast, modern, smoke-free, and more-or-less on-time.  It did help to have S have her attack-dog Italian schpiel ready to assert our seat reservations, though.

Canal-side in Murano

I had only been to Venice for a day trip and it was like another world to be able to explore away from the cruise-ship crowds, get lost at night, and take longer trips on the vaporetto to other clusters like Murano, San Michele, and Guidecca, where there are burnt-out glass furnaces, Igor Stravinsky's grave, and wonderful pizza, respectively.  Honestly, there's amazing pizza all over Italy, even in the Venice train station, but Guidecca after dark was deserted and the pizzeria staff were delighted to share their knowledge of American pop music....

We stayed at the Locanda Ca San Marcuola, which was simple, centrally-located close to San Marco and the train station, and spitting distance from a vaporetto stop.  Detracting from the beauty was the ceaseless canal noise.

My favorite sights were the Museo della Musica, where there were Calace mandolins; seeing Lame de Barba perform in the campo; and the Rick-Steves-endorsed bar tour with Alessandro.  Swallow your ego with your wine and enjoy the personal stand-up routine/roast; you'll learn what sfuso is.

We had to back track to get to Florence, where we stayed around the corner from the Duomo at Rodo Fashion Hotel, which is great if you really cannot walk far to the Duomo or want to spend your time watching the crowd ooze by from their terrace, but the constant noise and uncomfortable beds drove me weary.

Speaking of crowds, we never made it into the Duomo or to the Accademia to see the real David.... so take my opinion lightly on the younger David above.  Jack-ass tip?: Book your trip to the Uffizzi on-line and early in the day.  The line will be 20-minutes instead of hours.  S told me that she cried when she first saw the Birth of Venice, and from the crowd around it, I believe her -- that painting commands respect.  It's the only high-profile work of art I've seen where tourists almost-universally keep a several-pace distance from it.  I say "almost" because a group of high-school students crouched underneath it for a jubilant photo op.  It might be the most beautiful painting in all of Italy, and it's still underrated.

Beyond the periphery of the crowd, I enjoyed most our trip through Santa Croce just before closing, and the Opera del Duomo where you get to get up close to designs of the Duomo, art that just didn't fit into it, and copies of sculpture too high to otherwise observe closely.  Cathedrals aren't my motivating reason for travel, but perusing the monuments and grave markers in Santa Croche was a pleasant stroll through renaissance history.  The Opera del Duomo provided a relatively intimate look at the tourist-inflicted church around the corner.

Big men at the Opera del Duomo in Florence

Following our guided-tour theme, we took walking night tour (not highly recommended) and a day trip to a winery near Castellina in Chianti, San Gimignano, and Siena.  The Siena Duomo interior was certainly the highlight for me.  Again, I'm not a cathedral person, but the paintings, high striped columns, and tiled floors could give me days of oggling.

Inside the Siena Duomo

We staged our departure from the Hilton Rome Airport, relatively reasonably priced and it spared us from the long morning commute from Rome to Fiumicino.  Jack-ass tip?: print a copy of your air-travel itinerary, otherwise the airport ushers won't let you get to the ticket counter....

Chromebooks for Developers?

Earlier this year, my mother-in-law-J's hand-me-down MacBook kicked the bucket.  To reduce complexity and "support" phone time, my wife convinced me to buy her a Chromebook.  I still worry about not being able to help out (and I'm a sucker for gadgets), so I blew the net savings from the Chromebook gift in comparison to the potential MacBook purchase on a second Chromebook.

There are two stories here, one short and one developing....  The short story is that since for most people a Chromebook only runs a browser, there's really no support beyond figuring out web services.  We've yet to receive a cry for help from J.  To increase n to 2, we gave mother-in-law-L a Chromebook for Christmas 5ish years ago.... and I wouldn't know she used it ever, except that she says that she does and I occasionally see it on the kitchen island.

The longer story is what I should do with the extra Chromebook....

The Hardware

I dragged my feet on buying Chromebooks until BestBuy had a sale.  Google had a sales rep at BestBuy, and from his enthusiasm, I think I might have been his only customer ever.  I don't think he believed that I was going to buy a pair, even, as he took my photo and walked me to the cashier.  In the end, I bought two Samsung Chromebook Pros.

The display is beautiful, generating guilt never to use the touch screen; there's a built in stylus that I've only used to demo Adobe Sketch; the keyboard has all the programming bracket keys, but oddly spaced; the case is a nice compromise between cheap, sturdy, and thin; the battery life lets me forget about the machine for a few days.

Messing Around

Google says that you can run Android apps on ChromeOS, but that might be stretching the truth a little.  ChromeOS emulates Android for each app.  Apps appear on a phone-sized rectangle, with a buggy option to resize to the entire screen.  The file access seems to be limited to files under the app's install directory, so text-editor output might not be useful and there's no access to the SD-card from Android, presenting a hurdle for playing music stored locally.

I tried without success to get several Android music players to work.  To make matters more confusing, VLC has two versions that appear identical in the application tray -- one for Android and one for ChromeOS.  The former is buggy and freezes.  The latter is a Chrome app.... which Google won't support outside ChromeOS, so I worry VLC's future.  Google Play Music no longer lets you play music from an SD card.  In the end, I found that Remo works reliably, but exposes a track ordering and song selection identical to the filesystem.

I'm not so in love with my favorite Android apps, kWS and Jota, that I've unlocked ChromeOS beyond installing local Chrome Extensions. </whine>

Accessing a Server

The easiest way to get my code fix via ChromeOS is just to install an ssh Chrome extension and connect to my server.  There are several similar extensions.  I just chose the one recommended from the crosh shell.  Programming via an ssh connection reminds me of VT-100 and modem wails in college, but out of the box I got the same syntax highlighting from vim, YouCompleteMe, and npxmdv piped into less -R pretty prints markdown well enough.

Untethered Development

For proof of concept that one could write something using only a ChromeBook, I used Text to write a simple hello-world Chrome Extension from Google's tutorial.  Presumably one could work in the emulated Android environment with Termux, suggested on Medium....

Spark (for Java)

I almost missed this goodie on Technology Radar -- not only is it shadowed by the popular Apache Spark name, it's reference was hidden in a Spring Boot summary... not my favorite family of XML-bloated tools.  Spark is a lightweight web framework for Java 8.  It has modest run-time dependencies -- Jetty and slf4j and four-line hello-world  example -- including imports, but not close curly braces.

Let's go through a somewhat more complex conversation with Spark than "Hello, World" and set up a simple key-value store.

Project Setup

Create a Maven project.  Spark has instructions for Intellij and Eclipse.  You don't need an archetype; just make sure to select Java SDK 1.8.

Salutations, Terrene

We'll implement a simple REST dictionary so that we can show off our vocabulary, or our thesaurus skills, and because we're snooty, we'll "protect" our dictionary with a password.

package org.bredin;

import spark.*
import java.util.*

public class Main {
    private static Map<String,String> keyStore = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);

    public static void main(String[] args) {
        Spark.before((request, response) -> {
                    if (!"blam".equalsIgnoreCase(request.queryParams("secret"))) {
                        Spark.halt(401, "invalid secret");
        Spark.get("/get/:key", (request, response) -> readEntry(request, response));
        Spark.get("/put/:key/:value", (request, response) -> writeEntry(request, response));

    public static Object readEntry(Request request, Response response) {
        String key = request.params(":key");
        String value = keyStore.get(key);
        if (value == null) {
            return "unknown key " + key;
        } else {
            return value;

    public static Object writeEntry(Request request, Response response) throws UnsupportedEncodingException {
        String key = request.params(":key");
        String value = URLDecoder.decode(request.params(":value"), "UTF-8");
        String oldValue = keyStore.put(key, value);
        if (oldValue == null) return "";
        else return oldValue;

OK, it's not as terse as Ruby or Node.js, but it's readable (similar to Express), statically-typed, and integrates with the rest of your JVM.  The real beauty of Spark is in the route definitions and filters -- try approaching that level of conciseness with Spring... or even Jetty and annotations.

Spark provides before() and after() filters, presumably for authentication, logging, forwarding....  executed in the order applied in your code.  Above, there's only an unsophisticated password check.  I've not dug in to discover whether or not Spark exposes enough bells and whistles for Kerberos.

The Spark.get() methods provide conduits for REST into your application.  Spark checks to see that the request parameters are present, returning 404 otherwise, and dispatches your registered handlers.

You can run and test drive the example

$ curl 'localhost:4567/put/foo/What%20precedes%20bar?secret=BLAM'

$ curl localhost:4567/get/foo?secret=BLAM
What precedes bar

Neat!  I've always been uneasy that Jetty's annotations aren't thoroughly checked by the compiler.  DropWizard has loads of dependencies with versioning issues that have tripped me up.

Sweet Georgia Brown

Sweet Georgia Brown was probably the first jazz standard I heard. Maybe that's not unusual for people growing up in mid and southwest suburbs during the seventies and eighties. It probably exposes me as a poseur that the Harlem Globetrotters and Scooby Doo had more impact to my jazz education than Wynton Marsalis, Sunday brunches, Starbucks, or wherever I thought I should have heard jazz.

At any rate, Sweet Georgia Brown is probably between most Americans' ears at some point, the melody falls naturally on top of the chord changes, and it's a comfortable starting place to learn 2-5-1 chord changes....   theory wonk alert.... but this is important stuff for improvising and learning new tunes.

I'll refer in my notes to Brian Oberlin's version in F.


The chords to Sweet Georgia Brown are almost entirely a chain of a variant of ii-V7-I changes, and once you can chain those together on the mandolin (which is easy), you have most of the tune under your fingers.  The variant II7-V7-I7 is simpler than the "pure" ii-V7-I, but it's simpler, and pretty common in swing, dixie, and blues.

When, Ray, my first mandolin teacher, showed me this trick, I thought it was magic.  There are only two three-finger shapes you need for the trick.  Here's how to do it going from G7 to C7

  • II7 (D7): Start with the root-on-top triangle dominant-7th chord.
  • V7 (G7): Rotate your left hand slightly so that the bottom two notes each move one fret (half-step) down the neck, and you'll form a rootless dominant-7th chord.
  • I7 (C7): Rotate your left hand back to the root-on-top shape, but move your fingers down two frets down from where you started....

Hey, you're almost back to where you started! You can continue the 2-5-1 walk... and it actually sounds like music, and it works in any major key from any place higher than the second fret. 

More theory wonkery -- the F7, E7, Eb7, D7 sequence at the turnaround is also a 2-5-1 progression, just with some tritone substitution.  Try "un-substituting" on the turnaround, or applying the substitution for other 2-5-1's in the song, and you'll see some variety in chord choices and voicing.

Those two three-finger chord shapes moved up and down the neck will get you everything you need to play Sweet Georgia Brown, except for D minor and F.  For the former, a two-finger bar at the 7th fret is easy, but putting the root on top at the 5th fret of a three-finger might sound better in some places.  For the remainder, the F major chord, you could even substitute F7 and be OK.

For me, a swinging chop seems pretty natural.  Don Julin has a good free lesson/video.


Once you're comfortable with the chords, simply noodling with the major pentatonic scales with the chords, it might not surprise you to find the melody on your own.  There are a couple of odd runs, at least I thought they were odd, until I looked at the chords' scale notes, and then the melody becomes much easier to remember, feel, even.

Follow Brian Oberlin's notes thinking about where the first, third, fifth, and flatted seventh are in the chord, and the melody will seep into your finger memory.  I like swinging the melody, and alternating playing the chords and melody reinforce the syncopation.

Sound Bites

I'll circle back, soon, with some audio samples.

Exploring Python with Data

In the glut of Python data analysis tools, I'm sometimes embarrassed by my lack of comfort with Python for analysis.  Static types, Java/Scaladoc, and slick IDEs in concert with compilers provide a guides that I haven't been able to replace in Python.  Additionally, the problem of dynamic types seems to exacerbate problems with library interoperability.   With Anaconda and Jupyter, though, I can share some quick notes on getting started.

Here are some notes on surveying some admittedly canned data to classify malignant/benign tumors.  The Web is littered with examples of using sklearn to classify iris species using feature dimensions, so I thought I would share some notes exploring one of the other datasets included with scikit-learn, the Breast Cancer Wisconsin (Diagnostic) Data Set.  I've also decided to use Python 3 to take advantage of comprehensions and because that's what the Python community uses where I work.

The notebook below illustrates how to load demo data (loading csv is simple, too), convert the scikit-learn matrix to a DataFrame if you want to use Pandas for analysis, and applies linear and logistic regression to classify tumors as malignant or benign.

In [7]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
import pylab as pl
import pandas as pd
from sklearn import datasets

# demo numpy matrix to Pandas DataFrame
bc = datasets.load_breast_cancer()
pbc = pd.DataFrame(,columns=bc.feature_names)
mean radius mean texture mean perimeter mean area mean smoothness mean compactness mean concavity mean concave points mean symmetry mean fractal dimension ... worst radius worst texture worst perimeter worst area worst smoothness worst compactness worst concavity worst concave points worst symmetry worst fractal dimension
count 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 ... 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000
mean 14.127292 19.289649 91.969033 654.889104 0.096360 0.104341 0.088799 0.048919 0.181162 0.062798 ... 16.269190 25.677223 107.261213 880.583128 0.132369 0.254265 0.272188 0.114606 0.290076 0.083946
std 3.524049 4.301036 24.298981 351.914129 0.014064 0.052813 0.079720 0.038803 0.027414 0.007060 ... 4.833242 6.146258 33.602542 569.356993 0.022832 0.157336 0.208624 0.065732 0.061867 0.018061
min 6.981000 9.710000 43.790000 143.500000 0.052630 0.019380 0.000000 0.000000 0.106000 0.049960 ... 7.930000 12.020000 50.410000 185.200000 0.071170 0.027290 0.000000 0.000000 0.156500 0.055040
25% 11.700000 16.170000 75.170000 420.300000 0.086370 0.064920 0.029560 0.020310 0.161900 0.057700 ... 13.010000 21.080000 84.110000 515.300000 0.116600 0.147200 0.114500 0.064930 0.250400 0.071460
50% 13.370000 18.840000 86.240000 551.100000 0.095870 0.092630 0.061540 0.033500 0.179200 0.061540 ... 14.970000 25.410000 97.660000 686.500000 0.131300 0.211900 0.226700 0.099930 0.282200 0.080040
75% 15.780000 21.800000 104.100000 782.700000 0.105300 0.130400 0.130700 0.074000 0.195700 0.066120 ... 18.790000 29.720000 125.400000 1084.000000 0.146000 0.339100 0.382900 0.161400 0.317900 0.092080
max 28.110000 39.280000 188.500000 2501.000000 0.163400 0.345400 0.426800 0.201200 0.304000 0.097440 ... 36.040000 49.540000 251.200000 4254.000000 0.222600 1.058000 1.252000 0.291000 0.663800 0.207500

8 rows × 30 columns

In [8]:
from math import sqrt
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression

# Plot training-set size versus classifier accuracy.
def make_test_train(test_count):
    n =
    trainX =[0:test_count,:]
    trainY =[0:test_count]
    testX =[n//2:n,:]
    testY =[n//2:n]
    return trainX, trainY, testX, testY

def eval_lin(trainX, trainY, testX, testY):
    regr = LinearRegression(), trainY)
    y = regr.predict(testX)
    err = ((y.T > 0.5) - testY)
    correct = [x == 0 for x in err]
    return sum(correct) / err.size, np.std(correct) / sqrt(err.size)
def eval_log(trainX, trainY, testX, testY):
    regr = LogisticRegression(), trainY)
    correct = (regr.predict(testX) - testY) == 0
    return sum(correct) / testY.size, np.std(correct) / sqrt(correct.size)
def lin_log_cmp(n):
    trainX, trainY, testX, testY = make_test_train(n)  # min 20
    lin_acc, lin_stderr = eval_lin(trainX, trainY, testX, testY)
    log_acc, log_stderr = eval_log(trainX, trainY, testX, testY)
    return lin_acc, log_acc

xs = range(20,280,20)
lin_log_acc = [lin_log_cmp(x) for x in xs]

lin_lin, = pl.plot(xs, [y[0] for y in lin_log_acc], label = 'linear')
log_lin, = pl.plot(xs, [y[1] for y in lin_log_acc], label = 'logistic')
pl.legend(handles = [lin_lin, log_lin])
pl.xlabel('training size from ' + str(

Incidentally, I used the iPython nbconvert to paste the notebook here.

Caveats: Without types, it's pretty easy to make mistakes in manipulating the raw data.  Python and numpy scalar, array, and matrix arithmetic operators are gracious in accepting parameters, so you might get a surprise or two if you're not careful.  That combined with operating with black-box analysis tools gives me some skepticism of any conclusions, but it's a start, and the investment was cheap.

Other Plotting Tools: Seaborn.pairplot generates some slick scatter plot and histograms that will help identify outliers, describe ranges, and demonstrated redundancy in the data dimensions.  I tried removing some of obviously redundant data columns, resulting in no quality change in logistic classification and less than statistically significant reduction linear classification.

Linear or Logistic? It surprises me that logistic regression proved inferior classification to linear, but economists frequently use linear regression to model 0/1 variables.  Paul von Hippel has a post comparing relative advantages of linear versus logistic regression.  As a student, I had trouble both with application of logistic regression and conveying my travails to a thesis adviser. I wish I had read more commentary comparing the two 20 years ago.