Tim's Blog

Hack everything

Mind the End of Your Line

with 30 comments

One of the most frequent questions I answer about Git is why dealing with line endings is so difficult. This is an attempt to answer that question and explain the myriad of options and settings that control line endings in Git.

Git has gone through two systems for dealing with line endings in repositories. The root of the problem being that Unix, Linux and OS X use LF and Windows uses CRLF to denote the end of a line. Previous to OS X, Mac actually used CR, but for the most part we can ignore that.

None of this would be a problem if we each lived in our own little worlds and never shared code between operating systems. And by share I mean everything from working on a cross platform project to copy-pasting code out of a browser. In fact, anytime you download a sample project in a zip file, copy code out of a gist, copy code from someones blog or use code out of a file that you keep in Dropbox – you are sharing text and you need to deal with these invisible line-ending characters. All of these activities potentially introduce a different set of line endings into your code base which is going to make diffs messy and Git generally unhappy.

Git’s primary solution to all this is to specify that LF is the best way to store line endings for text files in a Git repository’s object database. It doesn’t force this on you but most developers using Git and GitHub have adopted this as a convention and even our own help recommends setting up your config to do this.

Background

Before describing the settings that control line endings in Git, there are a couple of things you need to know about: core.eol and what it means to write something to the object database.

End of line

core.eol

The first setting you need to know about is core.eol. In all but the rarest of cases you should never have to change this setting from it’s default. This setting doesn’t do much on its own, but as soon as we start telling Git to change our line endings for us we need to know the value of core.eol. This setting is used by all the other things we are going to talk about below, so it’s good to know that it exists and good to know that you probably don’t want to change it.

  • core.eol = native – The default. When Git needs to change line endings to write a file in your working directory it will change them to whatever is the default line ending on your platform. For Windows this will be CRLF, for Unix/Linux/OS X this will be LF.

  • core.eol = crlf – When Git needs to change line endings to write a file in your working directory it will always use CRLF to denote end of line.

  • core.eol = lf – When Git needs to change line endings to write a file in your working directory it will always use LF to denote end of line.

You can run git config --global core.eol to see what this value is set to on your system. If nothing comes back that means you are on the using the default which is native.

In and out of the object database

What is the object database? I’m going to talk a lot about two different operations: writing to the object database and writing out to the working directory. It helps to understand these concepts a bit before moving on.

You may already know that Git has it’s own database in that .git folder which Scott does a great job of explaining in Chapter 9 of his Pro Git book. All you need to know is that when you do something like git commit you are writing objects into the database. This involves taking the files that you are committing, calculating their shas and writing them into the object database as blobs. This is what I mean when I say writing to the object database and this is when Git has a chance to run filters and do things like converting line endings.

The other place that Git has a chance to run filters is when it reads out of the object database and writes files into your working directory. This is what I mean when I say writing out into the working directory. Many commands in Git do this, but git checkout is the most obvious and easy to understand. This also happens when you do a git clone or run a command like git reset that changes your working directory.

The Old System

First, let’s talk about the old system. This is the original set of features in Git designed to solve this particular problem of line endings. There is a good chance you are still using this system and don’t even know. Here is how it works: Git has a configuration setting called core.autocrlf which is specifically designed to make sure that when a text file is written to the repository’s object database that all line endings in that text file are normalized to LF. Here are the different options for core.autocrlf and what they mean:

  • core.autocrlf = false – This is the default, but most people are encouraged to change this immediately. The result of using false is that Git doesn’t ever mess with line endings on your file. You can check in files with LF or CRLF or CR or some random mix of those three and Git does not care. This can make diffs harder to read and merges more difficult. Most people working in a Unix/Linux world use this value because they don’t have CRLF problems and they don’t need Git to be doing extra work whenever files are written to the object database or written out into the working directory.

  • core.autocrlf = true – This means that Git will process all text files and make sure that CRLF is replaced with LF when writing that file to the object database and turn all LF back into CRLF when writing out into the working directory. This is the recommended setting on Windows because it ensures that your repository can be used on other platforms while retaining CRLF in your working directory.

  • core.autocrlf = input – This means that Git will process all text files and make sure that CRLF is replaced with LF when writing that file to the object database. It will not, however, do the reverse. When you read files back out of the object database and write them into the working directory they will still have LFs to denote the end of line. This setting is generally used on Unix/Linux/OS X to prevent CRLFs from getting written into the repository. The idea being that if you pasted code from a web browser and accidentally got CRLFs into one of your files, Git would make sure they were replaced with LFs when you wrote to the object database.

You can run git config --global core.autocrlf to see what this value is set to on your system. If nothing comes back that means you are on the using the default which is false.

How does Git know that a file is text? Good question. Git has an internal method for heuristically checking if a file is binary or not. A file is deemed text if it is not binary. Git can sometimes be wrong and this is the basis for our next setting.

The next setting that was introduced is core.safecrlf which is designed to protect against these cases where Git might change line endings on a file that really should just be left alone.

  • core.safecrlf = true – When getting ready to run this operation of replacing CRLF with LF before writing to the object database, Git will make sure that it can actually successfully back out of the operation. It will verify that the reverse can happen (LF to CRLF) and if not the operation will be aborted.

  • core.safecrlf = warn – Same as above, but instead of aborting the operation, Git will just warn you that something bad might happen.

One final layer on all this is that you can create a file called .gitattributes in the root of your repository and add rules for specific files. These rules allow you to control things like autocrlf on a per file basis. So you could, for instance, put this in that file to tell Git to always replace CRLF with LF in txt files:

*.txt crlf

Or you could do this to tell Git to never replace CRLF with LF for txt files like this:

*.txt -crlf

Or you could do this to tell Git to only replace CRLF with LF when writing, but to read back LF when writing the working directory for txt files like this:

*.txt crlf=input

OK. Got all that? See all the problems and the mess we’ve made? It gets worse when you start working on projects that push you towards different global settings. Enter the new system which is available in Git 1.7.2 and above.

The New System

The new system moves to defining all of this in the .gitattributes file that you keep with your repository. This means that line endings can be encapsulated entirely within a repository and don’t depend on everyone having the proper global settings.

In the new system you are in charge of telling git which files you would like CRLF to LF replacement to be done on. This is done with a text attribute in your repository’s .gitattributes file. In this case the man page is actually quite helpful. Here are some examples of using the text attribute:

  • *.txt text – Set all files matching the filter *.txt to be text. This means that Git will run CRLF to LF replacement on these files every time they are written to the object database and the reverse replacement will be run when writing out to the working directory.

  • *.txt -text – Unset all files matching the filter. These files will never run through the CRLF to LF replacement.

  • *.txt text=auto – Set all files matching the filter to be converted (CRLF to LF) if those files are determined by Git to be text and not binary. This relies on Git’s built in binary detection heuristics.

If a file is unspecified then Git falls back to the core.autocrlf setting and you are back in the old system. This is how backwards compatibility is maintained, but I would recommend (especially for Windows developers) that you explicitly create a .gitattributes file.

Here is an example you might use for a C# project:

# These files are text and should be normalized (convert crlf => lf)
*.cs      text diff=csharp
*.xaml    text
*.csproj  text
*.sln     text
*.tt      text
*.ps1     text
*.cmd     text
*.msbuild text
*.md      text

# Images should be treated as binary
# (binary is a macro for -text -diff)
*.png     binary
*.jepg    binary

*.sdf     binary

One final note that the man page for gitattributes mentions is that you can tell git to detect all text files and automatically normalize them (convert CRLF to LF):

* text=auto

This is certainly better than requiring everyone to be on the same global setting for core.autocrlf, but it means that you really trust Git to do binary detection properly. In my opinion it is better to explicitly specify your text files that you want normalized. Don’t forget if you are going to use this setting that it should be the first line in your .gitattributes file so that subsequent lines can override that setting.

Written by Tim Clem

March 1, 2012 at 1:01 pm

Posted in Git

30 Responses

Subscribe to comments with RSS.

  1. Nice post, very helpful. Is it worth GitHub creating a gitattributes repo, similar to the gitignore one? Somewhere to collate useful gitattributes per language/system/app?

    citizenmatt

    March 28, 2012 at 7:15 am

  2. I love that idea, let me see what I can do :)

    Tim Clem

    April 9, 2012 at 5:06 pm

  3. Very nice. Thank you, Tim! Let me add some comments which I hope you can add to your post (as long as I am not wrong). 1) I think you should emphasize that with core.autocrlf = input there is a chance that CRLF is already in the repository and will not be changed at checkout time. 2) *.txt crlf=auto looks like a typo. I guess, you ment * crlf=auto instead. 3) I am missing a troubleshooting guide. I work on Ubuntu with core.autocrlf=true, core.safecrlf=true, core.eol=lf and I need to clone a repo that contains files with mixed line endings. After the cloning those files are flagged as modified. How can I automatically fix the line endings?

    tbsprs (@tbsprs)

    April 17, 2012 at 4:32 pm

    • 1. You are absolutely correct. The core.autocrlf=input is meant to *keep* you from accidentally committing CRLF, it does nothing for correcting a problem that already exists in a repo. As I understand it, this is by design.

      2. In this particular case, I was trying to consistently use a single file format denoted by the ‘txt’ extension for all my examples. I was also trying to drive home the point that in my opinion, it is better to call out specific file types that you would like filtered/converted instead of using a catch-all like I mention in the end of the article (* text=auto). I do see that the Git configuration setting ‘text’ and the fact that I am using the file extension ‘.txt’ is a bit confusing. They are two entirely different things.

      3. The reason your files are all flagged as modified is b/c Git has changed the line endings for you in your working directory. All you need to do is add those files to the index and make a commit to normalize the line endings in the repository. You might want to check with the other contributors on the project before doing this, as it will give you a single, very noisy commit and you’ll want everyone else to pull and get in sync so that they don’t have to deal with merge conflicts and the likes.

      Tim Clem

      April 18, 2012 at 9:37 am

  4. Thank you Tim for the clarification. Please note that I wrote the Stackoverflow question linked below almost 2 years ago and this is the closest I’ve seen to an answer since then.

    http://stackoverflow.com/questions/3206843/how-line-ending-conversions-work-with-git-core-autocrlf-between-different-operat

    michaelpmaddox

    April 18, 2012 at 3:40 am

    • Nice! I probably could have done a little searching and just answered your question on so :)

      Tim Clem

      April 18, 2012 at 9:39 am

  5. Inspired by this article and the first comment, I created a GitHub repo for gitattributes templates, similar to the existing one for gitignore templates. Please feel free to contribute your .gitattributes files.

    Alexander Karatarakis

    May 16, 2012 at 7:16 am

  6. Thank you Tim!
    This article helped me to fix an annoying problem with Nuget packages and git repository interaction.

    I’ve posted my conclusions here:
    http://pampanotes.tercerplaneta.com/2012/07/git-nuget-packages-and-windows-line.html

    Jose Marcenaro

    July 4, 2012 at 11:11 am

  7. Nice article – thanks for writing it.

    Honestly, the whole design of settings for clrlf handling seems (IMHO) too much. For example, why introduce a safecrlf setting instead of making autocrlf no longer munge files (because the file may seem like text, but isn’t since it fails the round-trip test). I’m sure that there must be a technical reason for why do it this way, but but in the grand scheme of things, is likely to be small and ignorable when compared with the pain and complexity for the next 10-20 years or so, until git will disappear.

    Or, what should the expactation be for core.autocrlf = true and core.eol=lf ?

    Of course, reading the documentation will eventually provide an answer for those, but shouldn’t the expectation be that the settings be intuitive?

    On a separate subject, what do you think about keeping the file type (text/binary/etc) with the object itself? It’s less prone to issues, helps with perf too (do a thorough check only once, when object is added).

    Daniel

    July 6, 2012 at 6:26 pm

    • I actually think that normalization was a begrudgingly added feature: the initial versions of Git didn’t do this at all.

      > Or, what should the expactation be for core.autocrlf = true and core.eol=lf?

      ha. well if you had a CRLF or CR in there by mistake it would get normalized to LF, but otherwise you are just running the filter for fun!

      > On a separate subject, what do you think about keeping the file type (text/binary/etc) with the object itself?

      It’s an interesting idea. You’d have to search the mailing lists and see if that has ever been proposed. This would dramatically change the premise of Git’s object database and if you start adding meta information like this you might as well include encodings and the likes. I think the idea has always been that to Git – those objects are just bytes[] with a SHA1.

      Tim Clem

      July 7, 2012 at 2:02 pm

  8. version control system should not mess up with my files
    if the people on a project are to dumb to use an editor that keeps line endings consistent no clever autocorrection will ever help them

    OlegYch

    July 25, 2012 at 9:43 am

    • In general I agree and this was originally Git’s take on the matter as well.

      Tim Clem

      July 25, 2012 at 10:15 am

      • AMEN!! “Don’t mess with my files! Just do your job and revision them! I’ll deal with any assumed problems with line endings on my clients should I have that issue.”

        chris

        October 26, 2012 at 3:37 pm

  9. Great article. Answered many questions that I had.

    Oliver Schrenk

    August 30, 2012 at 9:33 am

  10. Seems it does not work on windows – at least the eol=native (which is critical to get rid of per user configuration – being the default) :
    http://stackoverflow.com/q/13531988/281545

    and especially the answer :
    http://stackoverflow.com/a/13556638/281545

    - anyone on this ?

    MrsD

    November 27, 2012 at 7:39 pm

    • I don’t think you need the `eol=native part`. What happens if you leave that out?

      Tim Clem

      November 28, 2012 at 12:49 pm

      • Thanks for reply- well nothing happens if I leave it out :)
        see : http://stackoverflow.com/a/13556638/281545 the last part
        I had put it redundantly as I was not sure if I had to – apparently it is just the default so it shouldn’t do anything anyway

        MrsD

        November 28, 2012 at 8:17 pm

  11. Great article I think I’ll leave core.autocrlf = true in msysgit so I can clone & checkout random project on the net easily.
    For VS project I was thinking of just having .gitattributes with * -text as no one will be using *nix on that project anyway. Is * -text going to give exact same results as if core.autocrlf = false?

    Sn

    December 19, 2012 at 4:54 pm

  12. Hello, Tim! thanks for the article.
    What i’d like to ask, that there is merge.renormalize setting according to mentioned here http://git-scm.com/docs/gitattributes#_merging_branches_with_differing_checkin/checkout_attributes

    But why github itself does not use it ?

    I cloned repo, downloaded it, changed one single line, committed, pushed, made pull request – and voila! upstream got patch “remove all lines and add all lines” instead of single one line change.

    I believe your server=side pull request system, since you override built-in Git local one, should be EOL-steady, especially since it seems there are a ready-made settings variable for that.

    Even if newbs did not google enought and did not changed default configs enough – that is still not an excuse for GitHub to make nonsense patch files when issuing Pull Request

    Arioch

    February 14, 2013 at 2:21 am

  13. This is quite helpful but I find it a bit vague on exactly when the value of core.eol is considered in changing line endings.

    Section 1, “Background”, introduces core.eol, saying “this setting is used by all the other things [..] below”, although is never explicitly mentioned after this section.

    Section 2, “The Old System”, repeatedly says “CRLF” and “LF” in the descriptions of the behaviors of core.autocrlf=true and core.autocrlf=input. If one always assumes “CRLF” = 0x0d 0x0a and “LF” = 0x0a then it doesn’t seem like core.eol has any effect here (maybe that’s true, since this is the “old way”).

    Section 3, “The New System”, again says “CRLF” and “LF” in most places, except in the “*.txt text” example, where it says writing to the working directory uses “the reverse replacement”. Maybe the unique phrase “reverse replacement” used here implies consulting core.eol; although the more literal meaning would seem to be the reverse of “CRLF to LF”, i.e. “LF to CRLF”.

    So in my reading of this, I don’t see how core.eol is used at all in either regime. But perhaps I am being too literal; maybe some instances of “CRLF” are meant to be understood as “the line ending determined by core.eol”? If so, which? Can you clarify this for me? Thanks.

    Chris

    February 25, 2013 at 3:02 pm

    • The core.eol configuration option is set to ‘native’ by default, which means it will be CRLF on Windows and LF on Unix/Linux. If you set this to another value you are changing what git understands the standard end of line to be for the platform (OS) you are running. Here are some examples:

      If you are on Windows and set core.eol=crlf, this is basically a noop. crlf is already the standard way to mark end of line on Windows.

      If you are on Windows and set core.eol=lf, you would now expect to have lf as the eol in files in your working directory. You might want to use core.autocrlf=input (if you don’t have a .gitattributes) to make sure you don’t accidentally commit a file with crlf or mixed line endings into the repo.

      If you are Linux and set core.eol=lf, this is basically a noop.

      If you are on Linux and set core.eol=crlf, you are crazy. :) I have no idea why you would want to do that, but it is possible. You would see a bunch of ^M in vi for the end of each line.

      Tim Clem

      February 26, 2013 at 11:01 am

      • I understand the semantics of the different core.eol values but it’s not clear which other options actually honor it. For examlpe, from what I’m reading elsewhere, core.autocrlf does not look at core.eol at all.

        Chris

        February 26, 2013 at 12:35 pm

      • It should, but to be honest I’ve never messed with this value. You could test it out easily enough by changing it, setting core.autocrlf=true and cloning a repo with line feeds that would have been converted to crlf in your working dir.

        Tim Clem

        February 26, 2013 at 2:22 pm

      • Nope, core.autocrlf expressly ignores core.eol. From the original commit: https://github.com/git/git/commit/942e7747678ecf5f118ea5b2d0c763166de21f3a
        And the current code confirms it (core.autocrlf and core.eol map to auto_crlf and core_eol respectively) :
        https://github.com/git/git/blob/master/convert.c#L98

        Chris

        February 26, 2013 at 4:12 pm

  14. Excellent – thanks Tim. Very helpful.

    Anthony Bouch

    March 19, 2013 at 11:02 pm

  15. Nice article. I have started working on someone else’s github project (which uses Windows based tools) where all the source files need to retain their native CRLF endings – (the linker throws an error when the source files end in LF only). I have git set up on my machine to not mess with line endings (I agree with others that a VCS should not mess with the files!) The problem is that the originator of the project in question (also a WIndows user – and this was his first git experience) set up git on his box to auto convert line endings (unfortunately the recommended option) – the result being that he is happily maintaining the repository with all the source files having LF endings in the repo. He himself doesn’t experience the problem, but myself and anyone else trying out his project by downloading a ZIP tarball of the latest version ends up with a set of source files all missing the _required_ CRLF endings. I’d like to know the simplest/easiest solution (for him) to correct this problem (and the rest of us trying to help out with the project).

    Thus, it would be nice if part of your article addressed (the inverse problem) of how to set up a configuration file so that the CRLF endings are retained in the repo no matter what the git settings are for the end user.

    jmrware

    March 22, 2013 at 8:48 pm

    • Off the top of my head, this is going to be tricky to do since Git is really only providing a feature to make sure that things are LF in the repo. You can tell it to not ever do line ending conversion, but I think what you are asking is how to tell it to do CRLF conversion.

      Tim Clem

      March 25, 2013 at 9:49 am

  16. Having just been bitten by autocrlf again I’ve finally set it to false for good. I just feel like autocrlf is trying to solve a problem that doesn’t exist and it’s a feature designed by *nix users based on false presumptions of how Windows users work.

    Making it a per repo setting in .gitattributes is an improvement, but line endings are going to be the least of your problems in cross platform projects and it’s all terribly confusing for what is for many projects just an aesthetic issue.

    Previously I’ve always used autocrlf = input on Windows just in case I somehow accidentally introduced CR when copying and pasting or whatever but it never seems reliable. Failed merges will sometimes temporarily convert everything to CRLF which makes resolving conflicts even more difficult, and I’m sure we’ve all seen those messages when committing about conversions taking place that make no sense.

    Kyle

    March 24, 2013 at 8:17 am

  17. We stumbled over here from a different website and thought I should check things out. I like what I see so i am just following you. Look forward to exploring your web page repeatedly.

    Glad

    April 1, 2013 at 3:07 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: