Tim's Blog

Hack everything

GitHub for Windows

leave a comment »

If you didn’t catch the news already, we just released GitHub for Windows to the world this morning. You can read the full blog post or go check out the launch site. I’m so excited to be shipping to such a great community of users!

Written by Tim Clem

May 21, 2012 at 1:58 pm

Posted in Uncategorized

Mind the End of Your Line

with 30 comments

One of the most frequent questions I answer about Git is why dealing with line endings is so difficult. This is an attempt to answer that question and explain the myriad of options and settings that control line endings in Git.

Git has gone through two systems for dealing with line endings in repositories. The root of the problem being that Unix, Linux and OS X use LF and Windows uses CRLF to denote the end of a line. Previous to OS X, Mac actually used CR, but for the most part we can ignore that.

None of this would be a problem if we each lived in our own little worlds and never shared code between operating systems. And by share I mean everything from working on a cross platform project to copy-pasting code out of a browser. In fact, anytime you download a sample project in a zip file, copy code out of a gist, copy code from someones blog or use code out of a file that you keep in Dropbox – you are sharing text and you need to deal with these invisible line-ending characters. All of these activities potentially introduce a different set of line endings into your code base which is going to make diffs messy and Git generally unhappy.

Git’s primary solution to all this is to specify that LF is the best way to store line endings for text files in a Git repository’s object database. It doesn’t force this on you but most developers using Git and GitHub have adopted this as a convention and even our own help recommends setting up your config to do this.

Background

Before describing the settings that control line endings in Git, there are a couple of things you need to know about: core.eol and what it means to write something to the object database.

End of line

core.eol

The first setting you need to know about is core.eol. In all but the rarest of cases you should never have to change this setting from it’s default. This setting doesn’t do much on its own, but as soon as we start telling Git to change our line endings for us we need to know the value of core.eol. This setting is used by all the other things we are going to talk about below, so it’s good to know that it exists and good to know that you probably don’t want to change it.

  • core.eol = native – The default. When Git needs to change line endings to write a file in your working directory it will change them to whatever is the default line ending on your platform. For Windows this will be CRLF, for Unix/Linux/OS X this will be LF.

  • core.eol = crlf – When Git needs to change line endings to write a file in your working directory it will always use CRLF to denote end of line.

  • core.eol = lf – When Git needs to change line endings to write a file in your working directory it will always use LF to denote end of line.

You can run git config --global core.eol to see what this value is set to on your system. If nothing comes back that means you are on the using the default which is native.

In and out of the object database

What is the object database? I’m going to talk a lot about two different operations: writing to the object database and writing out to the working directory. It helps to understand these concepts a bit before moving on.

You may already know that Git has it’s own database in that .git folder which Scott does a great job of explaining in Chapter 9 of his Pro Git book. All you need to know is that when you do something like git commit you are writing objects into the database. This involves taking the files that you are committing, calculating their shas and writing them into the object database as blobs. This is what I mean when I say writing to the object database and this is when Git has a chance to run filters and do things like converting line endings.

The other place that Git has a chance to run filters is when it reads out of the object database and writes files into your working directory. This is what I mean when I say writing out into the working directory. Many commands in Git do this, but git checkout is the most obvious and easy to understand. This also happens when you do a git clone or run a command like git reset that changes your working directory.

The Old System

First, let’s talk about the old system. This is the original set of features in Git designed to solve this particular problem of line endings. There is a good chance you are still using this system and don’t even know. Here is how it works: Git has a configuration setting called core.autocrlf which is specifically designed to make sure that when a text file is written to the repository’s object database that all line endings in that text file are normalized to LF. Here are the different options for core.autocrlf and what they mean:

  • core.autocrlf = false – This is the default, but most people are encouraged to change this immediately. The result of using false is that Git doesn’t ever mess with line endings on your file. You can check in files with LF or CRLF or CR or some random mix of those three and Git does not care. This can make diffs harder to read and merges more difficult. Most people working in a Unix/Linux world use this value because they don’t have CRLF problems and they don’t need Git to be doing extra work whenever files are written to the object database or written out into the working directory.

  • core.autocrlf = true – This means that Git will process all text files and make sure that CRLF is replaced with LF when writing that file to the object database and turn all LF back into CRLF when writing out into the working directory. This is the recommended setting on Windows because it ensures that your repository can be used on other platforms while retaining CRLF in your working directory.

  • core.autocrlf = input – This means that Git will process all text files and make sure that CRLF is replaced with LF when writing that file to the object database. It will not, however, do the reverse. When you read files back out of the object database and write them into the working directory they will still have LFs to denote the end of line. This setting is generally used on Unix/Linux/OS X to prevent CRLFs from getting written into the repository. The idea being that if you pasted code from a web browser and accidentally got CRLFs into one of your files, Git would make sure they were replaced with LFs when you wrote to the object database.

You can run git config --global core.autocrlf to see what this value is set to on your system. If nothing comes back that means you are on the using the default which is false.

How does Git know that a file is text? Good question. Git has an internal method for heuristically checking if a file is binary or not. A file is deemed text if it is not binary. Git can sometimes be wrong and this is the basis for our next setting.

The next setting that was introduced is core.safecrlf which is designed to protect against these cases where Git might change line endings on a file that really should just be left alone.

  • core.safecrlf = true – When getting ready to run this operation of replacing CRLF with LF before writing to the object database, Git will make sure that it can actually successfully back out of the operation. It will verify that the reverse can happen (LF to CRLF) and if not the operation will be aborted.

  • core.safecrlf = warn – Same as above, but instead of aborting the operation, Git will just warn you that something bad might happen.

One final layer on all this is that you can create a file called .gitattributes in the root of your repository and add rules for specific files. These rules allow you to control things like autocrlf on a per file basis. So you could, for instance, put this in that file to tell Git to always replace CRLF with LF in txt files:

*.txt crlf

Or you could do this to tell Git to never replace CRLF with LF for txt files like this:

*.txt -crlf

Or you could do this to tell Git to only replace CRLF with LF when writing, but to read back LF when writing the working directory for txt files like this:

*.txt crlf=input

OK. Got all that? See all the problems and the mess we’ve made? It gets worse when you start working on projects that push you towards different global settings. Enter the new system which is available in Git 1.7.2 and above.

The New System

The new system moves to defining all of this in the .gitattributes file that you keep with your repository. This means that line endings can be encapsulated entirely within a repository and don’t depend on everyone having the proper global settings.

In the new system you are in charge of telling git which files you would like CRLF to LF replacement to be done on. This is done with a text attribute in your repository’s .gitattributes file. In this case the man page is actually quite helpful. Here are some examples of using the text attribute:

  • *.txt text – Set all files matching the filter *.txt to be text. This means that Git will run CRLF to LF replacement on these files every time they are written to the object database and the reverse replacement will be run when writing out to the working directory.

  • *.txt -text – Unset all files matching the filter. These files will never run through the CRLF to LF replacement.

  • *.txt text=auto – Set all files matching the filter to be converted (CRLF to LF) if those files are determined by Git to be text and not binary. This relies on Git’s built in binary detection heuristics.

If a file is unspecified then Git falls back to the core.autocrlf setting and you are back in the old system. This is how backwards compatibility is maintained, but I would recommend (especially for Windows developers) that you explicitly create a .gitattributes file.

Here is an example you might use for a C# project:

# These files are text and should be normalized (convert crlf => lf)
*.cs      text diff=csharp
*.xaml    text
*.csproj  text
*.sln     text
*.tt      text
*.ps1     text
*.cmd     text
*.msbuild text
*.md      text

# Images should be treated as binary
# (binary is a macro for -text -diff)
*.png     binary
*.jepg    binary

*.sdf     binary

One final note that the man page for gitattributes mentions is that you can tell git to detect all text files and automatically normalize them (convert CRLF to LF):

* text=auto

This is certainly better than requiring everyone to be on the same global setting for core.autocrlf, but it means that you really trust Git to do binary detection properly. In my opinion it is better to explicitly specify your text files that you want normalized. Don’t forget if you are going to use this setting that it should be the first line in your .gitattributes file so that subsequent lines can override that setting.

Written by Tim Clem

March 1, 2012 at 1:01 pm

Posted in Git

Hardware Hacking in Vim

leave a comment »

We love our hardware hacking here at GitHub and with the recent upgrade in space at the new office we’ve been holding weekly hardware hacking workshops with the illustrious Matt Ganucheau. In those sessions the crew has been messing with sensors and actuators, bots and quad copter drones, mesh networks and octocat powered kinects…

We are huge fans of the Arduino stack which is hosted on GitHub, and there are dozens of Arduino boards running around the office. The beauty of writing code for the Arduino is that you rarely end up writing a lot of it and once you’ve written a sketch, it runs on a whole variety of compatible boards. Compared to how microcontroller prototyping was done 5 years ago, this is a serious breath of fresh air. There are some of us, however, who would rather continue to write code in vi (MacVim in my case), so I came up with a nice hack…

Enter vim-arduino.

This is my first attempt at getting Arduino IDE functionality into MacVim. Right now you can compile, deploy and open a serial port. The board for deployment always defaults to ‘uno’, but you can override that by putting the name of the board  on the first line of your sketch like so (any board found in boards.txt is valid):

// board: atmega328

Compiling code can be done like so: <Leader>ac

Compiling and the deploying is done like so: <Leader>ad

You can also open a serial port for debugging like this: <Leader>as

The goal here is to be 100% compatible with existing Arduino sketches and to take advantage of the rest of the Arduino environment. Your hardware hacking peers should never know you are secretly rocking vi.

Checkout the README to get things setup and bug me on Twitter if you find any of this useful.

Written by Tim Clem

October 10, 2011 at 12:12 pm

Posted in vim

Code First, Ask Questions Later

leave a comment »

I gave this talk at Rocky Mountain Ruby in Boulder CO on Sept 1, 2011.

Abstract

Ever wonder how software is designed and developed at GitHub? Are you curious about how new features are deployed to the site? (Hint: ask the robot.) Want to know why we don’t have any managers and don’t track vacation days?

This talk will explore running your company like an open source project and give some insight into how GitHub continues to leverage ruby and other open source tools to keep up with massive data loads and a growing community of users.

Slides

http://speakerdeck.com/u/tclem/p/code-first-ask-questions-later

Video

I know there was AV taken at the conference and I will post a link once it is available.

http://confreaks.net/videos/723-rockymtnruby2011-code-first-ask-questions-later

Credits

The font in my presentation is called Junction and can be found on the League of Movable Type. Many of the photographs are my own or from the GitHub Instagram stream. A couple of them are from flickr under the creative commons license: easy button, just do it, robot. The grainy robot attacking image is actually from SNL.

Written by Tim Clem

September 9, 2011 at 1:20 pm

Posted in Talks

Explaining GitHub Part 2

with one comment

If you just tuned in, I’m attempting to explain Git and GitHub to the layman. The previous post was all about Git. This one focuses on what GitHub actually does.

Part 2: GitHub

Okay, now we can move on to GitHub. Unlike writing a paper, software developers rarely write code all by themselves in a closet. Well, some do… but most of us collaborate on our software projects. GitHub provides a meeting place for software developers to collaborate around code and the different versions of that code. It stretches the analogy, but imagine that your paper with all the different snapshots you took along the way was available for other people not just to read, but to derive new papers from. You would always be credited as the original author, but everyone can create derivative works and if someone added a really nice abstract you could pull just that change back into your version of the paper. GitHub acts like a digital room where everyone hangs out to share their projects, ideas, and most importantly their code. The same way that Facebook provides a medium for people to hang out digitally and share status, photos, notes, etc –GitHub provides a medium for programmers to share code.

Code is a very abstract notion to people who don’t program for a living, and this is where I think the idea of writing a paper as an analogy for writing code is very helpful. Just as you could have written your paper in Spanish or French or English, code can be written in any number of languages. The primary purpose of code is to communicate something to a computer. You can think of it like writing an instruction manual. In 3rd grade, I remember, we did an activity where we had to explicitly write out the instructions for making a peanut butter and jelly sandwich. The teacher then took our written instructions and followed them to a tee. If you wrote “Take two pieces of bread”, she would literally tear off two very small pieces of bread to start making the sandwich. If you didn’t explain how to open the jelly jar, there would be no jelly on your sandwich. If you didn’t explicitly say that the jelly and peanut butter sides of the bread should be touching in the middle, you would get a sandwich with jelly and peanut butter on the outside. This is what it is like when you have to communicate with a computer. At the lowest level, computers are really dumb; they can only do what they are told, so your instructions (code) must tell the machine exactly what you want it to do.

As you can probably imagine, writing all this code can get very unwieldy. Not only because of the sheer amount of code that it takes to develop a complex system like the Android operating system, but because that code is written by a large number of programmers (think lots of people editing the same paper all at once). This is where GitHub really shines. GitHub provides the ability for developers to publish and share code, solicit collaboration from other programmers, gain reputation in the community and discover/contribute to other code projects of interest. In many ways it is a realization of the original intent of the internet. To throw out yet another metaphor, GitHub creates a framework for the evolutionary improvement of software in real time. Like the evolution of species, software is allowed to mutate and change over time creating new variants, symbiotic relationships (sometimes even parasitic ones!), and a dazzling array of constructs and programs. GitHub is like primordial soup in which all these software projects are interacting, growing and changing. And instead of eons, this all happens at the speed of the interwebs.

If you want to learn more about GitHub, you know where to go.

If things are still fuzzy, then all you need to know is that we run a web site…. And we sell these cute little creatures that are a cross between a cat and an octopus…

Written by Tim Clem

April 19, 2011 at 9:50 am

Posted in General, Software

Explaining GitHub

with 2 comments

GitHub is well beyond being just another startup, but I still find myself explaining over and over again what it is that GitHub actually does. My brother posted this link on Facebook the other day with a comment that he still has no idea what we do, but has so far surmised that it has “something to do with a cat and an octopus”.

So here, presented in two parts, is the crash course for the layman:

Part 1: Git

We have to start with Git. Git and GitHub are two different things. Git is a version control system. You don’t need to know anything about it other than it is amazing.

Okay, maybe I’ll tell you a little more than that. If you have ever typed a paper into a computer (or written one by hand) you have probably used a low fidelity form of version control. Say you are working on a big paper and at certain points you want to save off a version or draft. You might do this for many reasons. Maybe it is an important paper and you want to put a copy somewhere else safe. Maybe you add a `01` to the filename and later a `02` and so on (or you might use today’s date `ImportantPaper_4_13_2011.txt`). Maybe you need to turn in a draft or pass a draft off for editing. These activities are all analogous to creating a snapshot of what your paper looks like at a certain point in time and saving it off. Another good example of why having snapshots is important is if you are going to completely re-write your conclusion. Having a saved version of your old conclusion lets you go back to it at any time to see what you wrote or even to revert back because your new conclusion turned out to be a bad idea.

Software developers are usually focused on code instead of papers, but they have the same problems. You need to be able to quickly and easily take snapshots of the current state of your code base (along with some comments about that revision) and have the ability to come back to that point in history at any time. You might also want to compare the current state of your code with any previous state. Programmers call the solution to this problem version control. Just as you could write your paper in any number of document editing systems (Word, Pages, text files, Quark, LaTeX, etc) there are many different version control systems. Git is one of those systems. Developers write programs in a variety of computer languages and these programs are sort of like big papers, but vastly more complex. Having the ability to take snapshots frequently and travel back in history to see how a code file changed over time is an essential activity.

If you are really interested in learning more about Git you can download and learn about it here. I would also highly recommend my friend Scott Chacon’s book Pro Git.

If you have stuck around just for the Octocats, then I suggest you head over to the octodex until part 2 comes out.

Written by Tim Clem

April 15, 2011 at 10:10 am

Posted in General, Software

Hello SF

leave a comment »

Well, it is official. I’m not sure the dust has really settled, and there are a few lurking boxes in the house and plenty more filling the garage -but we have made the move to San Francisco! The downsizing has been therapeutic (2500 sq. ft. to 1000 sq. ft.) and the adjustment to living in a big city as so far been very positive. Even the cat seems to have settled in and is friendlier and calmer than we remember her being in Colorado.

I took a new position as a developer at GitHub working out of their main San Francisco office and couldn’t be more excited about it. The culture is ideal, my co-workers are some of the smartest people I have worked with yet, I’m learning a ton, and I have big plans for all the amazing things this company is going to do in the next few years. Ramping up in a new job is always exciting, with all the new things to learn and people to meet. Finding your way around a new city just adds to the excitement: where to live, where to eat, where to buy crap you need. I cannot speak more highly of the SF GitHub peeps who have gone out of their way to make my family and I feel welcome. From driving us around all weekend to find an apartment, to offering rides, to supplying blankets & pillows & air mattresses for sleeping in newly rented apartments, to dinner invites and insights about living in SF –we have been enthusiastically ushered into the community.

As far as what I’m working on at GitHub, here is just a taste (more to come later, I promise)…

Tom and I are working hard to extend the reach of our campfire robot (hubot) into the physical world. It still needs some better packaging, but version 1 of “Door Me” is up and running; allowing GitHub scros to never leave their seats to answer the door buzzer again.

I’m also jamming on libgit2 –building out objective-c bindings, contributing to rugged and some of the other language bindings. If you aren’t familiar with the project, you should check it out. It is probably one of the best examples of TDD on a C project that I’ve seen recently.

Written by Tim Clem

March 9, 2011 at 12:03 pm

Posted in Uncategorized

Follow

Get every new post delivered to your Inbox.