Summary of So You Think You Know Git - FOSDEM 2024
00:00:00Scott Chacon, co-founder of GitHub, gives a technical talk at FOSDEM. He introduces himself, mentioning his role in promoting Git. His book, "Pro Git," is available on the Git website. He is now working on a Git client named Git Butler. Chacon discusses the complexity of Git commands, noting that there are about 145 Git commands, with many being useful for scripting. The talk focuses on Git usage, with Chacon asking the audience about their familiarity with Git.
00:03:17The speaker discusses various Git functionalities like `reflog`, `interrogators`, `interactors`, and legacy tools like `P4`, `SVN`, and `CVS`. They mention the plethora of plumbing commands available, but decide not to delve into them, focusing instead on highlighting features that may be unfamiliar to users, especially those who have been using Git for a long time. The agenda for the talk includes helpful configurations, old but helpful features, and newer functionalities introduced in the last few years to update users on the latest developments in Git.
00:05:29The speaker discusses some lesser-known features and updates related to Git and GitHub, highlighting the importance of staying updated on developments. They introduce the concept of a "shotgun buffet" style talk, where a variety of information is shared for attendees to pick what interests them. The speaker emphasizes the usefulness of setting up aliases in Git commands to improve efficiency and shares tips on configuring Git repositories. Additionally, the speaker mentions the introduction of new features on GitHub since their departure, encouraging listeners to explore these advancements for potential benefits in their daily workflow.
00:07:29You can customize Git commands by running scripts and using the "bang" symbol at the beginning to execute them. This allows for creating custom outputs, like branch information sorted by last commit with Git descriptions. Another useful Git feature is the "includeIf" directive, which lets you set different configurations based on specific directories, overriding global and local settings. This provides flexibility for managing email addresses and other settings for work and open source projects separately.
00:09:41The speaker discusses the use of `git blame` and `git log` to track changes in a file or function's history. `git blame` can be used to identify changes in specific line ranges, making it faster and more understandable than viewing the entire file. Additionally, by using `git log` with a specified range and path, a more detailed history of a function's evolution can be obtained. The speaker also mentions that `git blame` has additional options such as `-C` for tracking code movement, which can be useful for understanding changes made to functions.
00:12:09The speaker explains how to use the `git blame` command line option multiple times to gather more detailed information about file changes and commits. By using `git blame -wCCC`, you can track file movements and renames to accurately attribute code changes. Additionally, utilizing the `git log -S` pickaxe option allows you to filter log output based on strings or regular expressions, helpful for tracking changes even if they were removed from the codebase. The `git reflog` command provides a log of your references, showing a chronological list of actions like pull, reset, checkout, and rebase performed on your repository.
00:14:20You can use `git reflog` to track every step and easily revert changes with `git reset`. Use `git diff --word-diff` for easier tracking of changes at word level instead of line level, especially useful for complex code changes. Enable `rerere` to automatically resolve repetitive merge conflicts by recording and reapplying previous resolutions. This feature is helpful during rebasing or cherry-picking operations. These are longstanding features in Git.
00:16:33You can use the command "git branch -D-column" or set a global config "column.ui.auto" to display branches in columns instead of alphabetical order. Another useful option is "branch.sort" to sort branches by reverse committer date. "git column" is a new command that simply formats input into columns. "Force with lease" prevents force pushing over changes by checking the reference before pushing, avoiding accidental overwrites during collaboration.
00:18:59Firstly, before pushing changes in Git, it is important to fetch to ensure you are up to date. If your push operation results in the same state as the server, Git automatically performs a safe force push. It is recommended to use "force with lease" for actions like rebasing or amending as it provides a safer alternative to force pushing.
Regarding signing commits, usually, corporate environments enforce commit signing. However, using SSH instead of GPG for signing commits is gaining popularity due to its reliability compared to GPG. Git now supports signing commits with SSH keys, allowing users to achieve the same verification as with GPG keys. By uploading SSH public keys to platforms like GitHub, commits can be verified, ensuring authenticity and preventing unauthorized changes.
00:21:20When using GitHub or GitLab, verifying key matches with email address is important. You can sign your keys or commits with GPG or SSH, which most platforms support. You can also sign your pushes to create an audit log, though this feature is not widely used on GitHub or GitLab. Kernel.org supports this feature, but you need to run your own server to utilize it effectively.
Git maintenance is a useful tool to optimize repository performance by running maintenance tasks in the background, such as garbage collection, commit graph generation, prefetching, loose object collection, and repacking. Running "git maintenance start" adds a Cron job to your config file, improving repository efficiency without interrupting your workflow.
00:23:44Instead of using "uh uh" after running a Git command, you can initiate Git maintenance. This is particularly useful for managing enormous repositories with over half a million files. Microsoft, after acquiring GitHub, dedicated significant effort to support large and monolithic repositories, such as the Windows codebase with 3.45 million files. They introduced tools like a virtual file system and Scaler to improve Git performance for such massive projects, eventually integrating these features into Git core. In addition, pre-fetching, as discussed in Git maintenance, can be scheduled to periodically fetch updates from the server for a smoother workflow.
00:26:16`git fetch` updates the local repository with new data from the remote repository, but it keeps references to the existing data to speed up future fetch operations. `git maintenance` can automate tasks like updating references and building commit graphs, which optimize fetch and pull operations for large repositories like the Linux kernel. Running `git commit graph` initially may be time-consuming, but subsequent operations will be faster due to caching. Setting `fetch.writeCommitGraph` can help improve performance further.
00:28:25Fetch command updates differences and accelerates log operations in a Git repository. File system monitor improves Git status performance for large repositories by monitoring file system changes. By enabling it, subsequent Git status commands are faster. Partial cloning allows filtering out blobs when cloning a repository, reducing download size and improving performance for graph operations. This optimizes fetching commits and trees separately from the rest of the repository contents, making operations more efficient.
00:30:50To summarize, using Git with sparse checkout is beneficial for managing large repositories efficiently, particularly when dealing with only specific directories or subdirectories. It allows for faster downloads, reduces the need to fetch unnecessary data, and simplifies working with monorepos by selectively checking out relevant files. By filtering out irrelevant blobs and using sparse indexes, operations like blame and status can be significantly quicker. Additionally, implementing features like multipack indexes and reachability bitmaps can further optimize performance for handling massive repositories.
00:33:12The speaker discussed the benefits of using Git with Subsets when working with a large amount of data in a Chromium build, leading to faster performance. They also shared lesser-known features of GitHub, such as allowed merge types, including merge commits, rebasing, and squash merging, and how enforcing these standards can streamline the development process. Additionally, they mentioned helpful GitHub features like Auto merge, Merge CU for stacking commits together, requiring a linear history, and signing commits, emphasizing the importance of utilizing these tools effectively for efficient collaboration.
00:35:26The speaker discusses using "git ls-remote" on GitHub repositories to access all refs, including pull requests that can be directly pulled down and merged from. They mention a tool called Git B that introduces virtual branches, allowing users to work on multiple branches simultaneously without needing to switch or stash changes, thereby simplifying the workflow. Additionally, there is a brief mention of "git range-diff" for comparing rebase series, but it is noted to be a more specific and advanced function. The speaker encourages exploring Git B on GitHub and ends the talk, opening the floor to questions, including one about GitHub's lack of support for "git range-diff," to which they explain its niche use case and suggest checking it out for specific needs.
00:37:53The speaker discusses the importance of establishing a good workflow in Git and building tooling around it to avoid confusion. They mention that while at GitHub, they primarily used merging over rebasing and thus did not find range diff useful. The speaker also critiques the challenges of working with submodules, noting that many are moving towards using monorepos to simplify management. They suggest that many companies are opting for package management solutions or monorepos to address the limitations of working with submodules.
00:40:20In the discussion, it is mentioned that with SSH signing, it is possible to specify more than one key for signing or verification. The speaker reflects on the idea of making "force with lease" the default option in Git, suggesting it could be a good idea for the more aggressive version to be the longer option. The emphasis is placed on Git's commitment to backward compatibility, ensuring scripts written years ago still work with the current version. Additionally, the speaker mentions new Git commands like "git switch" that aim to make the user interface less confusing but may introduce overlapping functions. The conversation also touches upon the extensive configuration options available in Git and the potential for customizing default behaviors. Finally, the speaker discusses the direction they would like to see Git moving towards, particularly regarding features for monorepos.
00:42:52The speaker shared that working on multiple branches simultaneously and having more than one index and head in Git would be valuable, which they are implementing in their Git client. They also expressed frustration at not being able to easily track and save all changes made in Git like in a Google Doc, suggesting a continuous record of changes to prevent loss. Additionally, the speaker highlighted the mismatch between Git's original design for patch submissions on mailing lists and the shift towards pull requests on platforms like GitHub, noting the lack of significant UI changes in core Git. They emphasized the potential for innovation in version control systems and expressed a love for the command line.
00:45:03The speaker mentions using Git Bash for interactive adding as it is faster than the native Git interactive add script. They also use Visual Studio Code with Vim key bindings for their work. While they prefer the command line interface for most tasks in Git due to its speed, they appreciate using a GUI for interactive adding to streamline the process.