Support compression for Actions logs to save storage space and
bandwidth. Inspired by
https://github.com/go-gitea/gitea/issues/24256#issuecomment-1521153015
The biggest challenge is that the compression format should support
[seekable](https://github.com/facebook/zstd/blob/dev/contrib/seekable_format/zstd_seekable_compression_format.md).
So when users are viewing a part of the log lines, Gitea doesn't need to
download the whole compressed file and decompress it.
That means gzip cannot help here. And I did research, there aren't too
many choices, like bgzip and xz, but I think zstd is the most popular
one. It has an implementation in Golang with
[zstd](https://github.com/klauspost/compress/tree/master/zstd) and
[zstd-seekable-format-go](https://github.com/SaveTheRbtz/zstd-seekable-format-go),
and what is better is that it has good compatibility: a seekable format
zstd file can be read by a regular zstd reader.
This PR introduces a new package `zstd` to combine and wrap the two
packages, to provide a unified and easy-to-use API.
And a new setting `LOG_COMPRESSION` is added to the config, although I
don't see any reason why not to use compression, I think's it's a good
idea to keep the default with `none` to be consistent with old versions.
`LOG_COMPRESSION` takes effect for only new log files, it adds `.zst` as
an extension to the file name, so Gitea can determine if it needs
decompression according to the file name when reading. Old files will
keep the format since it's not worth converting them, as they will be
cleared after #31735.
<img width="541" alt="image"
src="https://github.com/user-attachments/assets/e9598764-a4e0-4b68-8c2b-f769265183c9">
(cherry picked from commit 33cc5837a655ad544b936d4d040ca36d74092588)
Conflicts:
assets/go-licenses.json
go.mod
go.sum
resolved with make tidy
---
Conflict resolution: trivial
Things done differently: Improve localization message, use the paragraph
element instead of the div element, fix passing this variable to the
template and add a integration test
(cherry picked from commit 9633f336c87947dc7d2a5e76077a10699ba5e50d)
ForkRepository performs two different functions:
* The fork itself, if it does not already exist
* Updates and notifications after the fork is performed
The function is split to reflect that and otherwise unmodified.
The two function are given different names to:
* clarify which integration tests provides coverage
* distinguish it from the notification method by the same name
Previous arch package grouping was not well-suited for complex or multi-architecture environments. It now supports the following content:
- Support grouping by any path.
- New support for packages in `xz` format.
- Fix clean up rules
<!--start release-notes-assistant-->
## Draft release notes
<!--URL:https://codeberg.org/forgejo/forgejo-->
- Features
- [PR](https://codeberg.org/forgejo/forgejo/pulls/4903): <!--number 4903 --><!--line 0 --><!--description c3VwcG9ydCBncm91cGluZyBieSBhbnkgcGF0aCBmb3IgYXJjaCBwYWNrYWdl-->support grouping by any path for arch package<!--description-->
<!--end release-notes-assistant-->
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4903
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Exploding Dragon <explodingfkl@gmail.com>
Co-committed-by: Exploding Dragon <explodingfkl@gmail.com>
Now that my colleague just posted a wonderful blog post https://blog.datalad.org/posts/forgejo-runner-podman-deployment/ on forgejo runner, some time I will try to add that damn codespell action to work on CI here ;) meanwhile some typos managed to sneak in and this PR should address them (one change might be functional in a test -- not sure if would cause a fail or not)
### Release notes
- [ ] I do not want this change to show in the release notes.
- [ ] I want the title to show in the release notes with a link to this pull request.
- [ ] I want the content of the `release-notes/<pull request number>.md` to be be used for the release notes instead of the title.
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4857
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
Co-committed-by: Yaroslav Halchenko <debian@onerussian.com>
These are the three conflicted changes from #4716:
* https://github.com/go-gitea/gitea/pull/31632
* https://github.com/go-gitea/gitea/pull/31688
* https://github.com/go-gitea/gitea/pull/31706
cc @earl-warren; as per discussion on https://github.com/go-gitea/gitea/pull/31632 this involves a small compatibility break (OIDC introspection requests now require a valid client ID and secret, instead of a valid OIDC token)
## Checklist
The [developer guide](https://forgejo.org/docs/next/developer/) contains information that will be helpful to first time contributors. There also are a few [conditions for merging Pull Requests in Forgejo repositories](https://codeberg.org/forgejo/governance/src/branch/main/PullRequestsAgreement.md). You are also welcome to join the [Forgejo development chatroom](https://matrix.to/#/#forgejo-development:matrix.org).
### Tests
- I added test coverage for Go changes...
- [ ] in their respective `*_test.go` for unit tests.
- [x] in the `tests/integration` directory if it involves interactions with a live Forgejo server.
### Documentation
- [ ] I created a pull request [to the documentation](https://codeberg.org/forgejo/docs) to explain to Forgejo users how to use this change.
- [ ] I did not document these changes and I do not expect someone else to do it.
### Release notes
- [ ] I do not want this change to show in the release notes.
- [ ] I want the title to show in the release notes with a link to this pull request.
- [ ] I want the content of the `release-notes/<pull request number>.md` to be be used for the release notes instead of the title.
<!--start release-notes-assistant-->
## Draft release notes
<!--URL:https://codeberg.org/forgejo/forgejo-->
- Breaking features
- [PR](https://codeberg.org/forgejo/forgejo/pulls/4724): <!--number 4724 --><!--line 0 --><!--description T0lEQyBpbnRlZ3JhdGlvbnMgdGhhdCBQT1NUIHRvIGAvbG9naW4vb2F1dGgvaW50cm9zcGVjdGAgd2l0aG91dCBzZW5kaW5nIEhUVFAgYmFzaWMgYXV0aGVudGljYXRpb24gd2lsbCBub3cgZmFpbCB3aXRoIGEgNDAxIEhUVFAgVW5hdXRob3JpemVkIGVycm9yLiBUbyBmaXggdGhlIGVycm9yLCB0aGUgY2xpZW50IG11c3QgYmVnaW4gc2VuZGluZyBIVFRQIGJhc2ljIGF1dGhlbnRpY2F0aW9uIHdpdGggYSB2YWxpZCBjbGllbnQgSUQgYW5kIHNlY3JldC4gVGhpcyBlbmRwb2ludCB3YXMgcHJldmlvdXNseSBhdXRoZW50aWNhdGVkIHZpYSB0aGUgaW50cm9zcGVjdGlvbiB0b2tlbiBpdHNlbGYsIHdoaWNoIGlzIGxlc3Mgc2VjdXJlLg==-->OIDC integrations that POST to `/login/oauth/introspect` without sending HTTP basic authentication will now fail with a 401 HTTP Unauthorized error. To fix the error, the client must begin sending HTTP basic authentication with a valid client ID and secret. This endpoint was previously authenticated via the introspection token itself, which is less secure.<!--description-->
<!--end release-notes-assistant-->
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4724
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Shivaram Lingamneni <slingamn@cs.stanford.edu>
Co-committed-by: Shivaram Lingamneni <slingamn@cs.stanford.edu>
Was facing issues while writing unit tests for federation code. Mocks weren't catching all network calls, because was being out of scope of the mocking infra. Plus, I think we can have more granular tests.
This PR puts the client behind an interface, that can be retrieved from `ctx`. Context doesn't require initialization, as it defaults to the implementation available in-tree. It may be overridden when required (like testing).
## Mechanism
1. Get client factory from `ctx` (factory contains network and crypto parameters that are needed)
2. Initialize client with sender's keys and the receiver's public key
3. Use client as before.
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4853
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Aravinth Manivannan <realaravinth@batsense.net>
Co-committed-by: Aravinth Manivannan <realaravinth@batsense.net>
- Adjust the counting of the number of lines of a file to match the
amount of rendered lines. This simply means that a file with the content
of `a\n` will be shown as having `1 line` rather than `2 lines`. This
matches with the amount of lines that are being rendered (the last empty
line is never rendered) and matches more with the expecation of the
user (a trailing EOL is a technical detail).
- In the case there's no EOL, the reason why it was counting
'incorrectly' was to show if there was a trailing EOL or not, but now
text is shown to tell the user this.
- Integration test added.
- ResolvesCodeberg/Community#1612
- Adjust the `RepoRefByType` middleware to allow for commit SHAs that
are as short as 4 characters (the minium that Git requires).
- Integration test added.
- Follow up to 4d76bbeda7
- Resolves#4781
An instance-wide actor is required for outgoing signed requests that are
done on behalf of the instance, rather than on behalf of other actors.
Such things include updating profile information, or fetching public
keys.
Signed-off-by: Gergely Nagy <forgejo@gergo.csillger.hu>
Fix#31707.
Also related to #31715.
Some Actions resources could has different types of ownership. It could
be:
- global: all repos and orgs/users can use it.
- org/user level: only the org/user can use it.
- repo level: only the repo can use it.
There are two ways to distinguish org/user level from repo level:
1. `{owner_id: 1, repo_id: 2}` for repo level, and `{owner_id: 1,
repo_id: 0}` for org level.
2. `{owner_id: 0, repo_id: 2}` for repo level, and `{owner_id: 1,
repo_id: 0}` for org level.
The first way seems more reasonable, but it may not be true. The point
is that although a resource, like a runner, belongs to a repo (it can be
used by the repo), the runner doesn't belong to the repo's org (other
repos in the same org cannot use the runner). So, the second method
makes more sense.
And the first way is not user-friendly to query, we must set the repo id
to zero to avoid wrong results.
So, #31715 should be right. And the most simple way to fix#31707 is
just:
```diff
- shared.GetRegistrationToken(ctx, ctx.Repo.Repository.OwnerID, ctx.Repo.Repository.ID)
+ shared.GetRegistrationToken(ctx, 0, ctx.Repo.Repository.ID)
```
However, it is quite intuitive to set both owner id and repo id since
the repo belongs to the owner. So I prefer to be compatible with it. If
we get both owner id and repo id not zero when creating or finding, it's
very clear that the caller want one with repo level, but set owner id
accidentally. So it's OK to accept it but fix the owner id to zero.
(cherry picked from commit a33e74d40d356e8f628ac06a131cb203a3609dec)
The previous commit laid out the foundation of the quota engine, this
one builds on top of it, and implements the actual enforcement.
Enforcement happens at the route decoration level, whenever possible. In
case of the API, when over quota, a 413 error is returned, with an
appropriate JSON payload. In case of web routes, a 413 HTML page is
rendered with similar information.
This implementation is for a **soft quota**: quota usage is checked
before an operation is to be performed, and the operation is *only*
denied if the user is already over quota. This makes it possible to go
over quota, but has the significant advantage of being practically
implementable within the current Forgejo architecture.
The goal of enforcement is to deny actions that can make the user go
over quota, and allow the rest. As such, deleting things should - in
almost all cases - be possible. A prime exemption is deleting files via
the web ui: that creates a new commit, which in turn increases repo
size, thus, is denied if the user is over quota.
Limitations
-----------
Because we generally work at a route decorator level, and rarely
look *into* the operation itself, `size:repos:public` and
`size:repos:private` are not enforced at this level, the engine enforces
against `size:repos:all`. This will be improved in the future.
AGit does not play very well with this system, because AGit PRs count
toward the repo they're opened against, while in the GitHub-style fork +
pull model, it counts against the fork. This too, can be improved in the
future.
There's very little done on the UI side to guard against going over
quota. What this patch implements, is enforcement, not prevention. The
UI will still let you *try* operations that *will* result in a denial.
Signed-off-by: Gergely Nagy <forgejo@gergo.csillger.hu>
This is an implementation of a quota engine, and the API routes to
manage its settings. This does *not* contain any enforcement code: this
is just the bedrock, the engine itself.
The goal of the engine is to be flexible and future proof: to be nimble
enough to build on it further, without having to rewrite large parts of
it.
It might feel a little more complicated than necessary, because the goal
was to be able to support scenarios only very few Forgejo instances
need, scenarios the vast majority of mostly smaller instances simply do
not care about. The goal is to support both big and small, and for that,
we need a solid, flexible foundation.
There are thee big parts to the engine: counting quota use, setting
limits, and evaluating whether the usage is within the limits. Sounds
simple on paper, less so in practice!
Quota counting
==============
Quota is counted based on repo ownership, whenever possible, because
repo owners are in ultimate control over the resources they use: they
can delete repos, attachments, everything, even if they don't *own*
those themselves. They can clean up, and will always have the permission
and access required to do so. Would we count quota based on the owning
user, that could lead to situations where a user is unable to free up
space, because they uploaded a big attachment to a repo that has been
taken private since. It's both more fair, and much safer to count quota
against repo owners.
This means that if user A uploads an attachment to an issue opened
against organization O, that will count towards the quota of
organization O, rather than user A.
One's quota usage stats can be queried using the `/user/quota` API
endpoint. To figure out what's eating into it, the
`/user/repos?order_by=size`, `/user/quota/attachments`,
`/user/quota/artifacts`, and `/user/quota/packages` endpoints should be
consulted. There's also `/user/quota/check?subject=<...>` to check
whether the signed-in user is within a particular quota limit.
Quotas are counted based on sizes stored in the database.
Setting quota limits
====================
There are different "subjects" one can limit usage for. At this time,
only size-based limits are implemented, which are:
- `size:all`: As the name would imply, the total size of everything
Forgejo tracks.
- `size:repos:all`: The total size of all repositories (not including
LFS).
- `size:repos:public`: The total size of all public repositories (not
including LFS).
- `size:repos:private`: The total size of all private repositories (not
including LFS).
- `sizeall`: The total size of all git data (including all
repositories, and LFS).
- `sizelfs`: The size of all git LFS data (either in private or
public repos).
- `size:assets:all`: The size of all assets tracked by Forgejo.
- `size:assets:attachments:all`: The size of all kinds of attachments
tracked by Forgejo.
- `size:assets:attachments:issues`: Size of all attachments attached to
issues, including issue comments.
- `size:assets:attachments:releases`: Size of all attachments attached
to releases. This does *not* include automatically generated archives.
- `size:assets:artifacts`: Size of all Action artifacts.
- `size:assets:packages:all`: Size of all Packages.
- `size:wiki`: Wiki size
Wiki size is currently not tracked, and the engine will always deem it
within quota.
These subjects are built into Rules, which set a limit on *all* subjects
within a rule. Thus, we can create a rule that says: "1Gb limit on all
release assets, all packages, and git LFS, combined". For a rule to
stand, the total sum of all subjects must be below the rule's limit.
Rules are in turn collected into groups. A group is just a name, and a
list of rules. For a group to stand, all of its rules must stand. Thus,
if we have a group with two rules, one that sets a combined 1Gb limit on
release assets, all packages, and git LFS, and another rule that sets a
256Mb limit on packages, if the user has 512Mb of packages, the group
will not stand, because the second rule deems it over quota. Similarly,
if the user has only 128Mb of packages, but 900Mb of release assets, the
group will not stand, because the combined size of packages and release
assets is over the 1Gb limit of the first rule.
Groups themselves are collected into Group Lists. A group list stands
when *any* of the groups within stand. This allows an administrator to
set conservative defaults, but then place select users into additional
groups that increase some aspect of their limits.
To top it off, it is possible to set the default quota groups a user
belongs to in `app.ini`. If there's no explicit assignment, the engine
will use the default groups. This makes it possible to avoid having to
assign each and every user a list of quota groups, and only those need
to be explicitly assigned who need a different set of groups than the
defaults.
If a user has any quota groups assigned to them, the default list will
not be considered for them.
The management APIs
===================
This commit contains the engine itself, its unit tests, and the quota
management APIs. It does not contain any enforcement.
The APIs are documented in-code, and in the swagger docs, and the
integration tests can serve as an example on how to use them.
Signed-off-by: Gergely Nagy <forgejo@gergo.csillger.hu>
- add package counter to repo/user/org overview pages
- add go unit tests for repo/user has/count packages
- add many more unit tests for packages model
- fix error for non-existing packages in DeletePackageByID and SetRepositoryLink
Replace a double select with a simple select.
The complication originates from the initial implementation which
deleted packages instead of selecting them. It was justified to
workaround a problem in MySQL. But it is just a waste of resources
when collecting a list of IDs.
- In the spirit of #4635
- Notify the owner when their account is getting enrolled into TOTP. The
message is changed according if they have security keys or not.
- Integration test added.