Contributing to a project you find on GitHub

No comments:
For the purpose of this article, I assume you have git installed, have a GitHub account, and are fundamentally familiar git.

You've been exploring GitHub and have discovered a project you want to contribute to. Let's say it is my fork of the Functional Koans. You want to add some flair to the Python Koans in particular.

Fork the DocOnDev project


To create a fork, press the "fork" button on the project's page.


When the fork is complete, you will have a copy of the project in your repositories listing.

Clone your project

Now you need to clone your fork. Make sure you use the “Your Clone URL” and not the “Public Clone URL”. You want to be able to push changes back to your own repository.


$ git clone git@github.com:[your-github-account]/functional-koans.git

Once the clone is complete your repo will have a remote named “origin” that points to your fork on github. You will use "origin" for your own regular activities.

Get the source

Let's make sure we have the python source.

$ cd functional-koans
$ git checkout -b python
$ git pull origin python
From git@github.com:[your-github-account]/functional-koans
 * branch            python    -> FETCH_HEAD
Auto-merging README.markdown
CONFLICT (add/add): Merge conflict in README.markdown
Automatic merge failed; fix conflicts and then commit the result.

Ugh. File conflict. No worries, open the file and edit it to look like this:

# Functional Koans


### About the Koans

There are several functional languages that have contributed their own
versions of the koans.  Each language has it's own branch with
detailed instructions on how to get started.

### Getting Started

To get the koans for a particular language, simply clone the repo and
checkout the branch:

'git clone git://github.com/relevance/functional-koans.git' 
'git checkout branch -b [branch_name]'
'git pull origin [branch_name]'


where [branch_name] is the name of the branch (language) you want to
work on.  To get a list of all branches run:

'git branch -a'

Now add the changed file to the python branch and check it in

$ git add README.markdown
$ git commit -m'Fixed README conflict'

Add a remote to the original project

First, switch back to the master branch locally.
$ git checkout master
Then let's take a look at all of our branches.
$ git branch -a
* master
  python
  origin/FSharp
  origin/HEAD
  origin/clojure
  origin/master
  origin/scala
We see that the origin (your repository) has all of the branches, but our local repository only has master and python. The asterisk next to master indicates this is our active branch.

Let's add another remote that points to the source you originally forked from. You can call this remote anything you want. I suggest you use something that makes sense to you. In GitHub's examples, they use the name "upstream", but that I find that to be too general. I tend to name this remote after the original account I forked from. So for the sake of this example, we will use my convention:

$ git remote add DocOnDev git://github.com/DocOnDev/functional-koans.git

You may have noticed that we used the public clone URL for DocOnDev. Truth is, he probably won't give you access to push changes directly to his repository anyway. I heard he's a real jerk about that stuff. And if he had given you access to push directly to his repo, why did you create a fork...?

Just to be sure...

Ok, so we just forked the code, but let's make sure we have the absolute latest version from DocOnDev before we start hacking away. We are going to fetch the latest from DocOnDev. Fetch grabs the latest source from all branches in DocOnDev's functional-koans repo and brings them down to us.

$ git fetch DocOnDev
From git://github.com/DocOnDev/functional-koans
 * [new branch]      FSharp     -> DocOnDev/FSharp
 * [new branch]      clojure    -> DocOnDev/clojure
 * [new branch]      master     -> DocOnDev/master
 * [new branch]      python     -> DocOnDev/python
 * [new branch]      scala      -> DocOnDev/scala

Now if we take another look at the branches...

$ git branch -a
* master
  python
  origin/FSharp
  origin/HEAD
  origin/clojure
  origin/master
  origin/python
  origin/scala
  DocOnDev/FSharp
  DocOnDev/clojure
  DocOnDev/master
  DocOnDev/python
  DocOnDev/scala

Make your changes

You are now free to make your changes and updates. You can add functionality, refactor, whatever it was that inspired you to contribute to this project in the first place.

Make sure you are cognizant of the various branches. Don't make changes in python branch, switch to master branch, and commit unless you absolutely intend to add the file to the master branch. Check in early and often. And of course, write your tests first. ;)

Check in every so often and fetch (or pull) any changes DocOnDev may have committed.
$ git checkout python
$ git fetch DocOnDev python
From git://github.com/DocOnDev/functional-koans
 * branch            python     -> FETCH_HEAD

Request a Pull

You've completed your changes and you'd like DocOnDev to take a look at them and consider them for addition to the project. Simply navigate to your fork and use the "Pull Request" button. You can explain your changes to DocOnDev as a part of the request.

And that's the basics of it. It may take a while to get used to working on a project with multiple branches and multiple committers. I know I hozed my repo more than once before I got the hang of it.








Python String Calculator Kata - Step by Step

No comments:

Background

For the past week, I have been working on the String Calculator Kata in Python. I watched Gary Bernhardt's version of it and then set out to tackle it myself. Not surprisingly, my solution was much like Gary's.

Of course, watching a master was a bit of a cheat. I hadn't really figured it out. I just saw what he did and mimicked it. I certainly understood what he was doing and I understood the solution I subsequently developed, but I skipped a very important part of the process; the discovery and real learning.

My wife has taken an interest in software development and is considering a possible career change. Along with reading books like "Apprenticeship Patterns", "The Inmates are Running The Asylum", and "Hackers and Painters", she is learning Python.

I decided that the String Calculator Kata would be a good exercise for us to pair on. I could show her some tricks and help her along. This would be a great learning exercise for the both of us.

Kata

Take One
Our first attempt was over the phone with a screen share. I did all the talking and typing while she watched, listened, and provided me with the occasional, "uh-huh" to let me know she was still there. All in all, I'd say it was a fair experience at best. In my opinion, I had shown her quite a bit, but had actually taught her nothing. She wasn't going to benefit from doing a Kata by rote. She needed to understand how the moves came to be and why they were being done.

This was going to take more time.
Take Two
Our second attempt was in person, side by side, sharing a single computer. I still did most of the talking and all of the typing, but the dynamic was very different. Jennifer was able to ask questions, point at the screen, and interact unlike with the screen share. And I took a much more deliberate approach to the Kata. Making no assumptions about the language; acting as if I were learning it all for the first time as well.

This was a good learning experience for both of us. I learned new things about Python and I felt connected to the developer I once was; the guy who would explore a language for the best way to do something; the guy who would spend hours on the same few lines of code in an effort to get them perfect. I've been doing Kata on and off for about a year with a recent deliberateness. But it wasn't until I used it to teach someone else that I achieved the next level of enlightenment.

After some more thought, I decided to recount the entire experience on my blog. I've already discussed Kata in a couple of posts. The string calculator can be found in Scheme and Python on my github. But none of these capture my thought process. I am not saying my thought process is right. Hell, I have no idea. But I think sharing it and soliciting feedback can't hurt.
The full experience
We start, of course, with a single test.
calc_test.py
import nose.tools
from calc import add

def test_empty_string_is_0():
    assert 0 == add('')

This fails as expected. And we write our add method.
calc.py
def add(input_string):
    return 0

We add the next test
calc_test.py
import nose.tools
from calc import add

def test_empty_string_is_0():
    assert 0 == add('')

def test_single_number_returns_value():
    assert 1 == add('1')


Which forces the change:
calc.py
def add(string):
    if string:
        return 1
    else:
        return 0

But this doesn't cover any single number, it covers a particular single number. So let's update the test.
calc_test.py
import nose.tools
from calc import add

def test_empty_string_is_0():
    assert 0 == add('')

def test_single_number_returns_value():
    assert 1 == add('1')
    assert 10 == add('10')

Our code fails again and we need to figure out a way to return the value of a string. Naturally, I asked the googles. They don't know anything themselves, but they always know somebody who knows something.

A tiny bit of poking around, and I found there is a string.atoi(s[, base]) function, but it was deprecated in verison 2.0 in favor of the built-in int() function. This is what we are looking for:
calc.py
def add(string):
    if string:
        return int(string)
    else:
        return 0
Now the test passes, but I want to get cute. I want to eliminate the if/else. I happen to know Python has lambda capability. So I do some research and refactor to:
calc.py
def add(string):
    return string and int(string) or 0
Cute. Yep.

Back to business. We've covered the first couple of criteria, but we now need to handle more than one item. So we write the test (of course).
calc_test.py
import nose.tools
from calc import add

def test_empty_string_is_0():
    assert 0 == add('')

def test_single_number_returns_value():
    assert 1 == add('1')
    assert 10 == add('10')

def test_multiple_numbers_returns_sum():
    assert 3 == add('1,2')

The familiar taste of failure leads us in search of an answer. How do we get the values out of a string? As it turns out, str is a sequence type in Python. So we can refer to its elements fairly easily.
calc.py
def add(string):
    return string and _sum_string(string) or 0

def _sum_string(string):
    return int(string[0]) + int(string[2])

This gives a string index out of range error on our earlier passing tests. This won't do. We need to refer to the individual values only if there is a comma in the string. Looking back at our reference for sequence types, we find the in operator, which is perfect for this case.
calc.py
def add(string):
    return string and _sum_string(string) or 0

def _sum_string(string):
    if ',' in string:
        return int(string[0]) + int(string[2])
    else:
        return int(string)

This works. Let's see how far we can get using lambda expressions (for fun). Refactor to lambda:
calc.py
def add(string):
    return string and _sum_string(string) or 0

def _sum_string(string):
    return ',' in string and int(string[0]) + int(string[2]) or int(string)
But what happens when the integers are more than a single digit? This calls for a new test.
calc_test.py
import nose.tools
from calc import add

def test_empty_string_is_0():
    assert 0 == add('')

def test_single_number_returns_value():
    assert 1 == add('1')
    assert 10 == add('10')

def test_multiple_numbers_returns_sum():
    assert 3 == add('1,2')
    assert 15 == add('10,5')
This fails with a ValueError. It appears to be trying to get the int() value of a comma. So our hard-coded string offsets won't do. Then what else can we do here?

Perusal of the standard string methods, leads us to the built-in partition method that was added in version 2.5. If it was recently added, it must be what we need....
calc.py
def add(string):
    return string and _sum_string(string) or 0

def _sum_string(string):
    return ',' in string and _sum_delimited_string(string) or int(string)

def _sum_delimited_string(string):
    string_parts = string.partition(',')
    return int(string_parts[0]) + int(string_parts[2])
And the tests pass. But my wife points out this won't work for strings with more than two digits.
Hmmmm....
Let's confirm the assumption.
calc_test.py
import nose.tools
from calc import add

def test_empty_string_is_0():
    assert 0 == add('')

def test_single_number_returns_value():
    assert 1 == add('1')
    assert 10 == add('10')

def test_multiple_numbers_returns_sum():
    assert 3 == add('1,2')
    assert 15 == add('10,5')
    assert 25 == add('10,3,12')
She was right. We get another ValueError. This time we are trying to get the int() of '3,12'. I see an immediate solution that should work. A little recursion and we are good to go.
calc.py
def add(string):
    return string and _sum_string(string) or 0

def _sum_string(string):
    return ',' in string and _sum_delimited_string(string) or int(string)

def _sum_delimited_string(string):
    string_parts = string.partition(',')
    return int(string_parts[0]) + _sum_string(string_parts[2])
And once again, our tests pass.

At this point, I am actively holding myself back. I don't like this form of recursion. If I'm going to do it, I want it all contained in a single function. I really want to use split(), map(), and sum(), but I must remember that we've not discovered these yet. We are solving the problem with what we currently "know".

Roy's first two requirements down, now we move on to custom delimiters; starting with a new-line as a delimiter.

calc_test.py
import nose.tools
from calc import add

def test_empty_string_is_0():
    assert 0 == add('')

def test_single_number_returns_value():
    assert 1 == add('1')
    assert 10 == add('10')

def test_multiple_numbers_returns_sum():
    assert 3 == add('1,2')
    assert 15 == add('10,5')
    assert 25 == add('10,3,12')

def test_newline_is_a_delimiter():
    assert 6 == add('1\n2,3')
Now we get a ValueError on an attempt to get int() for '1\n2'. Let's update the code to get this to pass.
calc.py
def add(string):
    return string and _sum_string(string) or 0

def _sum_string(string):
    return ',' in string and _sum_delimited_string(string) or int(string)

def _sum_delimited_string(string):
    string = _normalize_delimiters(string)
    string_parts = string.partition(',')
    return int(string_parts[0]) + _sum_string(string_parts[2])

def _normalize_delimiters(string):
    return string.replace('\n', ',')
We are back to green. We are moving on to the fourth requirement; custom delimiters. This requirement reads as follows:
  1. Allow the Add method to handle a different delimiter:
    1. to change a delimiter, the beginning of the string will contain a separate line that looks like this: “//[delimiter]\n[numbers…]” for example “//;\n1;2” should return three where the default delimiter is ‘;’ .
    2. the first line is optional. all existing scenarios should still be supported
First the test.
calc_test.py
import nose.tools
from calc import add

def test_empty_string_is_0():
    assert 0 == add('')

def test_single_number_returns_value():
    assert 1 == add('1')
    assert 10 == add('10')

def test_multiple_numbers_returns_sum():
    assert 3 == add('1,2')
    assert 15 == add('10,5')
    assert 25 == add('10,3,12')

def test_newline_is_a_delimiter():
    assert 6 == add('1\n2,3')

def test_allow_custom_delimiter():
    assert 3 == add('//;\n1;2')
And now the code.
calc.py
def add(string):
    return string and _sum_string(string) or 0

def _sum_string(string):
    return ',' in string and _sum_delimited_string(string) or int(string)

def _sum_delimited_string(string):
    string = _normalize_delimiters(string)
    string_parts = string.partition(',')
    return int(string_parts[0]) + _sum_string(string_parts[2])

def _normalize_delimiters(string):
    if string.startswith('//'):
        string_parts = string.partition('\n')
        delimiter = string_parts[0]
        delimiter = delimiter[2:]
        string = string_parts[2].replace(delimiter, ',')
    return string.replace('\n', ',')
This fails, a bit unexpectedly. What went wrong here? The custom delimiter is not being parsed.

A little inspection and we realize that _sum_string() is to blame. Or more accurately, we see that we need to normalize the delimiters earlier in the process. So we move that call from _sum_delimited_string() into _sum_string(), which seems to make more sense anyway.
calc.py
def add(string):
    return string and _sum_string(string) or 0

def _sum_string(string):
    string = _normalize_delimiters(string)
    return ',' in string and _sum_delimited_string(string) or int(string)

def _sum_delimited_string(string):
    string_parts = string.partition(',')
    return int(string_parts[0]) + _sum_string(string_parts[2])

def _normalize_delimiters(string):
    if string.startswith('//'):
        string_parts = string.partition('\n')
        delimiter = string_parts[0]
        delimiter = delimiter[2:]
        string = string_parts[2].replace(delimiter, ',')
    return string.replace('\n', ',')
We are green again, but I don't think I can take it anymore. Look at all those [n] references in the code. And while I am not terribly concerned about it, the recursion could give us trouble on huge strings. There has to be a better way to do this.

So let's go back to the documentation and see if there is anything else we can use to assist us along. One of our problems is the partition breaks the string into three chunks. Maybe something that gives us just the values and strips out the delimiter. Or maybe something that breaks it into multiple chunks.

We discover split(). This appears to have some promise. It can chunk up the entire string into a list. That's nice. Maybe we can ditch the recursive call.

First round, let's get rid of the partition() call and replace it with split().
calc.py
def add(string):
    return string and _sum_string(string) or 0

def _sum_string(string):
    string = _normalize_delimiters(string)
    return ',' in string and _sum_delimited_string(string) or int(string)

def _sum_delimited_string(string):
    string_parts = string.split(',', 1)
    return int(string_parts[0]) + _sum_string(string_parts[1])

def _normalize_delimiters(string):
    if string.startswith('//'):
        delimiter, string = string.split('\n', 1)
        delimiter = delimiter[2:]
        string = string.replace(delimiter, ',')
    return string.replace('\n', ',')
But this didn't seem to do anything for us. The code doesn't really look any better. And it still "feels" messy. We've two different functions that both claim to sum a string, but only one of them actually does. And these functions call one another. Let's push the responsibility back into a single function and see if we can clean this up.
calc.py
def add(string):
    return string and _sum_string(string) or 0

def _sum_string(string):
    string = _normalize_delimiters(string)
    if ',' in string:
        string_parts = string.split(',', 1)
        return int(string_parts[0]) + _sum_string(string_parts[1])
    else:
        return int(string)

def _normalize_delimiters(string):
    if string.startswith('//'):
        delimiter, string = string.split('\n', 1)
        delimiter = delimiter[2:]
        string = string.replace(delimiter, ',')
    return string.replace('\n', ',')

That is better. But isn't that recursive call really just iterating over the list and converting everything to an int() and then adding them up? Looks a bit like a Scheme function with string_parts[0] and string_parts[1] in place of car and cdr. I know this is crying out for a map, but how do I explain this to Jennifer? How do we naturally stumble upon the solution?

So we focus in on the split() function.  What does this do? It returns a list of the words in a string. So we dig into lists a bit to learn what they are all about. And we find there are a few functions that are very useful with lists: filter(), map(), and reduce(). Of these, map() and reduce() look like they might do us some good.

We can use map() to convert our list of strings to a list of integers. Then, we can use reduce() to add them all up. Sound good? Yes. Yes it does. But when we check the syntax on reduce(), we see a note that says to use the sum() function for adding items in a list.

And we've arrived at our intended destination.
calc.py
def add(string):
    return string and _sum_string(string) or 0

def _sum_string(string):
    string = _normalize_delimiters(string)
    if ',' in string:
        return sum(map(int, string.split(',')))
    else:
        return int(string)

def _normalize_delimiters(string):
    if string.startswith('//'):
        delimiter, string = string.split('\n', 1)
        delimiter = delimiter[2:]
        string = string.replace(delimiter, ',')
    return string.replace('\n', ',')

That looks a little better. And all tests still pass. Now, let's refactor some more to make it a bit more terse.
calc.py
def add(string):
    return string and _sum_string(string) or 0

def _sum_string(string):
    string = _normalize_delimiters(string)
    return ',' in string and sum(map(int, string.split(','))) or int(string)

def _normalize_delimiters(string):
    if string.startswith('//'):
        delimiter, string = string.split('\n', 1)
        string = string.replace(delimiter[2:], ',')
    return string.replace('\n', ',')

We decided this was a good breaking point for the session. We'd covered a lot of new material in a relatively short period of time.

Tonight, we do it again, starting over from scratch. And I suspect we will skip some of the steps as we already know map() is our desired route.