These are posts sent to the FSFE planet in English.
Here some little known, yet awesome apps or tools that I use. Thanks to the people working on these (I’m glad to have met some of them, and they’re awesome too)!
Transportr is an Android app to help you use public transports systems. It’s simply the best one I’ve seen, and it supports a lot of systems (city-wide like Berlin or Paris and even long-distance).
Known (formerly “idno”) is more “socially aware” than ikiwiki. It runs with PHP and it’s basically your easy-to-run indieweb space. If you use it with http://brid.gy you will enjoy a nice integration with twitter and other silos (see an example of my own).
YunoHost is custom debian distribution aiming at making self-hosting easy. It provides a nice web interface for administration of your self-hosted server and for users of the web server. If you have basic linux administration skills, this will be very helpful.
Pinboard a simple and efficient bookmarking app that also archives the content of marked pages (if you pay for it).[^nofs]
Sharesome lets you easily share files on the web. It has a pleasant interface that works well on all devices I have tested so far. It’s also available as a web app. The neat feature is that you can choose where to host your data (for instance, with remotestorage; you can get an account at https://5apps.com).
Some shameless self-promo with ToSDR, the app that tells you what happens to your rights online by rating and summarising Terms of service and privacy policies. You can also get it directly in your web browser or as a web app.
If you’re looking for a curated list of awesome web services that are free of charge and based on free software and open data, look no further than Jan’s Libre projects.
[^nofs]: Unfortunately, Pinboard is not released as free software. But you can export your bookmarks.
Last month, I worked with Björn Schießle on ownCloud’s first defensive publication. This one covers ownCloud’s encryption system.
The challenge is that ownCloud is a free software server for file syncing and file sharing, and you can connect it to different storage backends. However, you don’t necessarily want these storage providers to access data unencrypted.
Thus, being able to use encryption to protect user data is paramount, but not trivial. Users of local encryption tools such as GnuPG will know that.
The source of their defensive publication is available on Linux Defenders’ repositories. In order to make it, I started working from Björn’s blog post. It turns out that Björn already had documents describing their encryption system which were used for internal purposes. They were very useful to make figures and illustrate the publication. The point is that making a defensive publication is not very difficult, most of the time, pieces are already available and you just need to put them together and to submit them to Linux Defenders.
Encryption for file syncing servers is an important feature worth protecting from further patents, and that a lot of patents get applied for about all sorts of “cloud” systems. Just publishing this as a defensive publication does not cost anything, but its benefits are great: a defensive publication is a statement of prior art that prevents everyone from being able to exclude anyone to implement what the publication specifies. By submitting it to the IP.com database, Linux Defenders make sure that the free software community’s innovations are accessible to patent office examiners who are responsible for reviewing and granting claims to patent applications.
If you’re also interested in making a defensive publication, we’re about to publish our tutorial. This will guide you through the steps and the parts that can make a good defensive publication. Your feedback is welcome!
Last month, I introduced what defensive publications are: documents describing something (a new feature, a new algorithm, a new system) in order to prevent further patents.
Defensive publications are needed because on the one hand, even when the source code is available to the public, it is not necessarily accessible to the patent office examiner who’s reviewing patent applications. This is why we submit defensive publications to their databases: it makes the review process more aware of what free software projects develop.
On the other hand, while pushing code to a public repository is easy for a project contributor, writing and submitting a defensive publication is not as straightforward.
On of my goals is to help fix this, so that producing defensive publications gets as easy as possible for Free Software projects. So, during this month, amongst other patent-related activities, I published a first version the a defensive publication template on Github. Hopefully, I will be able to improve on this version and push other useful things for the whole Linux Defenders programme. Your feedback would be very appreciated!
A prior observation before explaining how the template works: obviously, writing defensive publications is not a developer’s top priority. But writing a defensive publication is not something that can be left entirely to lawyers (although we can help). Writing a defensive publication requires insights on:
- how the code works, how the system is designed
- how other solutions, especially prior solutions and current trends develop
For this reason, developers are in a privileged position to write defensive publications. The situation is not entirely unlike that of writing documentation. Writing documentation is probably not a developer’s favourite task (and indeed the state of some documentation is evidence of this). However, we know that a good documentation is also a sign of a project’s health and so we make process and tools to facilitate this task. Fortunately, writing a defensive publication is not much different from writing documentation, and so we should be able to kill two birds with one stone.
How does it work?
Once you have identified some part of your software that you want to write a defensive publication about:
READMEshould guide you. Especially, you can find examples of things to use to start your own publication, such as figures, flowcharts, etc.
Update variables like:
TITLE PROJECT URL DESCRIPTION TAGS
(I’ll probably write a script to automate that…)
src/(you can one from the
example/directory) and also update the
tags. You can edit the abstract itself, later at the end.
This will later appear on the list of http://defensivepublications.org.
You can start writing your document in
src/- You can write in any format provided that you are able to produce a PDF at the end so we can submit it to the patent office. Right now the template is very much focused around pandoc which is able to convert a lot of different kinds of texts, like Markdown to LaTeX. You can follow the
As you see, it’s a bit rudimentary now, but the idea behind with this template is that you should be able to take relevant bits of your documentation and integrate them directly into your defensive publication’s source files. Then you can use
pandoc to combine all the files together in the relevant order.
That way you don’t have to duplicate content, but rather you reuse relevant parts of your documentation that describe your software for the defensive publication.
Once you’ve done that, you need to write the abstract and probably write an introduction if you need to give more details. Another part to introduce your publication can be a description of the current state of the art relevant to your software: basically, what’s the problem your software solves and how other solutions try to address this problem in your field.
The template comes with a file
example/template.pdf that should guide you through the different parts that make a defensive publication.
Get involved with us
If you are interested in writing a defensive publication or have more questions, don’t hesitate to join #linuxdefenders on the IRC freenode server.
Also, I’m very much interested in your feedback. What’s your opinion? What do you need to write a defensive publication as easily as possible?
Next month, I should be able to show an example from defensive publications, with additional explanation and comments!
In late 2012, a new manifesto emerged from the free software community: The User Data Manifesto, written by Frank Karlitschek of Owncloud. Quite similar to the Franklin Street Statement on freedom and network services, the manifesto was taking another approach which I think was good: identifying a new set of rights for users, or as the manifesto puts it: “defining basic rights for people to control their own data in the internet age.”
I have applauded the approach and I think the current manifesto is a good starting point – which is why I have started an effort to create a new better version built on the first version. If you are interested directly into discussing the new version then you can skip the first part of this article.
What’s wrong with the current version?
Right now, the manifesto consists of 8 points — and I think that’s probably too much. As you will see, some of these points overlap. Another thing that’s wrong with the current version is that it mixes several issues together with no hierarchy or context between these; for instance, some points are about user rights, some others are about implementation only (like point 8. Server software transparency).
So let me take some points separately:
1 - Own the data
The data that someone directly or indirectly creates belongs to the person who created it.
This one is very, very problematic. What does “belong” mean, what does “own” mean? Why is one used in the title and the other in the description? What happens when several persons “created” data. What does “create [data]” even mean? I don’t create “data”, my computer generates data when I do things and make stuff.
This point could be read like a copyright provision and thus justify current copyright laws. This is probably not the intention behind this. So this point should be fixed. This reason alone is enough to make it a necessity to update the current manifesto.
But what was the intention behind this?
I think I understand it, and I agree with it. Maybe you know the meme “All your base are belong to us” sometimes deviated into “All Your Data Are Belong to Us” in reference to Google/NSA/etc.
This is basically what we want to prevent. For a user data manifesto to be effective, it means that even if I use servers to store some of my data, it does not mean that the server admin should feel like being able to do as if it was their data.
However, a careful note is needed here. As you will notice, I’m referring to data as “my data” or “their data.” This is very important to consider. If we want a good User Data Manifesto, we need to think clearly about what makes data, “User Data.”
The current version of the manifesto says that what makes User Data is data “created by the user.” But I think that’s misleading.
Usually, there are two ways in which one might refer to data as “their data” (i.e. “their own” data):
Personal data, or personally-identifiable information, are often referred to by someone as their data. But in our case, that’s not relevant, this is covered by laws such as data protection in the European Union. That’s not the scope of this manifesto, because in this case the person is called the “data subject” and typically, this person is not necessarily a “user.”
However, this is users that we are concerned with in this manifesto. Which leads to the second case in which one usually refers to data as their own data:
Data that is stored on my hard-drive or other storage apparatus. In this case, the meaning of ownership of data is an extension of the ownership of the physical layer on which it sits.
For instance, when I refer to the books that are in my private library at home, I say that these are my books even though I have not written any of them. I own these books not because I have created them, but because I bought them.
So, for the purpose of the User Data Manifesto, how should we define User Data to convey the objective that servers admins do not have the right to do as they wish with user data, i.e. our data?
I propose this:
“User data” means any data uploaded by a user and/or generated by a user, while using a service on the Internet.
This definition is aimed at replacing point 1 of the first version. This definition is consistent with our current way of referring to data as “our own data” but it also includes the case where data is not necessarily generated by devices that we own, but instead are generated by us, for us on devices that somebody else owns.
2 - Know where the data is stored
Everybody should be able to know: where their personal data is physically stored, how long, on which server, in what country, and what laws apply.
I have tried to improve this. This is point 2 in my version of the manifesto.
3 - Choose the storage location
Everybody should always be able to migrate their personal data to a different provider, server or their own machine at any time without being locked in to a specific vendor.
This is point 3 in my version of the manifesto.
4 - Control access
Everybody should be able to know, choose and control who has access to their own data to see or modify it.
5 - Choose the conditions
If someone chooses to share their own data, then the owner of the data selects the sharing license and conditions.
These two points are now point 1 in my version. I have merged them together. However, I have modified the part about “choosing the conditions” and instead refer to “permissions” (as in, read-only, read-write, etc.). I think the “conditions” as in licensing conditions are out of scope of this manifesto.
6 - Invulnerability of data
Everybody should be able to protect their own data against surveillance and to federate their own data for backups to prevent data loss or for any other reason.
This point was redundant with point 4 and it was drafted in a vague manner, so I have modified it and integrated in point 1 of my version of the manifesto.
7 - Use it optimally
Everybody should be able to access and use their own data at all times with any device they choose and in the most convenient and easiest way for them.
I feel this is not in scope with the manifesto because this describes a feature, not a right, and also because I felt it was a bit vague: what’s “most convenient and easiest way for them”? So I decided to leave this one out.
8 - Server software transparency
Server software should be free and open source software so that the source code of the software can be inspected to confirm that it works as specified.
This is about implementation related to point 3 of the current version related to the right to choose any location to store their data, the right to move to another platform. So I have merged it into point 3 of my version of the manifesto regarding the freedom to choose a platform.
That’s it. Overall, I think the manifesto was a good starting point and that it should be improved and updated. I think that we should reduce the number of points because 8 is too many; especially because some of them are redundant. We should also give more context after we lay out the rules.
Obviously, this is also a request for comments, criticism and improvement of my version of the manifesto.
Thanks to Jan-Christoph Borchardt, Maurice Verheesen, Okhin and Cryptie for their feedback and/or suggested improvements since April 2013.
My current proposal
User Data Manifesto, v2 DRAFT: as of today, August 26, 2014:
This manifesto aims at defining basic rights for people regarding their own data in the Internet age. People ought to be free and should not have to pay allegiance to service providers.
- “User data” means any data uploaded by a user and/or generated by a user, while using a service on the Internet.
Thus, users should have:
Control over user data access
Data explicitly and willingly uploaded by a user should always be under the ultimate control of the user. Users should be able to decide whom to grant (direct) access to their data and under which permissions such access should occur.
Cryptography (e.g. a PKI) is necessary to enable this control.
Data received, generated, collected and/or constructed from users’ online activity while using the service (e.g. metadata or social graph data) should be made accessible to these users and put under their control. If this control can’t be given, than this type of data should be anonymous and not stored for long periods.
Knowledge of how the data is stored
When the data is uploaded to a specific service provider, users should be able to know where that specific service provider stores the data, how long, in which jurisdiction the specific service provider operates, and which laws apply.
A solution would be, that all users are free to choose to store their own data on devices (e.g. servers) in their vicinity and under their direct control. This way, users do not have to rely on centralised services. The use of peer-to-peer systems and unhosted apps are a means to that end.
Freedom to choose a platform
Users should always be able to extract their data from the service at any time without experiencing any vendor lock-in.
Open standards for formats and protocols, as well as access to the programs source code under a Free Software license are necessary to guarantee this.
If users have these rights, they are in control of their data rather than being subjugated by service providers.
Many services that deal with user data at the moment are gratis, but that does not mean they are free. Instead of paying with money, users are paying with their allegiance to the service providers so that they can exploit user data (e.g. by selling them or building a profile for advertisers).
Surrendering privacy in this way may seem to many people a trivial thing and a small price to pay for the sake of convenience that the Internet services brings. This has made this kind of exchange to become common.
Service providers have thus been unwittingly compelled to turn their valuable Internet services into massive and centralised surveillance systems. It is of grave importance that people understand/realize this, since it forms a serious threat to the freedom of humanity
When users control access to the data they upload (Right #1), it means that data intended to be privately shared should not be accessible to the service provider, nor shared with governments. Users should be the only ones to have ultimate control over it and to grant access to it. Thus, a service should not force you to disclose private data (including private correspondence) with them.
That means the right to use cryptography[^snake-oil] should never be denied. On the contrary, cryptography should be enabled by default and be put under the users’ control with Free Software that is easy to use.
[^snake-oil]: We mean effective cryptography. If the service provider enables cryptography but controls the keys or encrypts the data with your password, it’s probably snake oil.
Some services allow users to submit data with the intention to make it publicly available for all. Even in these cases, some amount of user data is kept private (e.g. metadata or social graph data). The user should also have control over this data, because metadata or logging information can be used for unfair surveillance. Service providers must commit to keeping these to a minimum, and only for the purpose of operating the service.
When users make data available to others, whether to a restrictive group of people or to large groups, they should be able to decide under which permissions they grant access to this data. However, this right is not absolute and should not extend over others’ rights to use the data once it has been made available to them. What’s more, it does not mean that users should have the right to impose unfair restrictions to other people.
Ultimately, to ensure that user data is under the users’ control, the best technical designs include peer-to-peer or distributed systems, and unhosted applications. Legally, that means terms of service should respect users’ rights.
When users use centralised services that uploads data to specific storage providers instead of relying on peer-to-peer systems, it is important to know where the providers might store data because they could be compelled by governments to turn over data they have in their possession (Right #2).
In the long term, all users should have their own server. Unfortunately, this is made very difficult by some Internet access providers that restrict their customers unfairly. Also, being your own service provider often means having to administer systems which require expertise and time that most people currently don’t have or are willing to invest.
Users should not get stuck into a specific technical solution. This is why they should always be able to leave a platform and settle elsewhere (Right #3). It means users should be able to have their data in an open format, and to exchange information with an open protocol. Open standards are standards that are free of copyright and patent constraints. Obviously, without the source code of the programs used to deal with user data, this is impractical. This is why programs should be distributed under a Free Software license like the GNU AGPL-3[^agpl].
[^agpl]: The GNU AGPL-3 safeguards this right by making it a legal obligation to provide access to the modified program run by the service provider. (§ 13. Remote Network Interaction)
Thanks to Sam Tuke for his feedback on the post and the manifesto!
Three weeks ago, I started working for Open Invention Network as an intern[^intern]. Open Invention Network, or OIN in short, aims at creating a safe environment for Linux and Linux-based systems to thrive in spite of all the threats that patents constitute to software developers.
Defensive publications are sort of anti-patents:
- while patents are claimed to exclude others from being able to implement something,
- defensive publications prevent anyone to exclude others from being able to implement something.
They’re called defensive because they can be used against further patent applications or they can be used a posteriori to defend oneself against patent infringement claims. Indeed, if the software is already accessible by the public before a patent on it is submitted, there’s no way you or anyone would be infringing on a patent on that software. Actually in that situation the patent should be invalidated. Then you might ask: why do I need to write defensive publications if I have already published my source code? — Unfortunately, that’s because just releasing source code is not effective to protect yourself against patents.
In theory, it is true that you are immune from infringement of subsequent patents as soon as you’ve made your software source code publicly accessible online, for instance using a public version control system like Github.
In practice, it’s not really effective. Here’s why:
the life of patents begin at the patent office where patent applications are submitted, then reviewed by patent office staff:
Patent examiners have a strong sense of the technology that is patented, but they’re missing an understanding of what has been and is currently being developed in the open source world. As shocking as it may seem, the result is the examiner formulating an inaccurate sense of what is innovative. As the final arbiter of a very significant monopoly grant, they are often grossly uninformed in terms of what lies beyond their narrowly scoped search. This is not wholly their fault as they have limited resources and time. However, it is a strong indication of a faulty system that is so entrenched in the archaic methods under which patent offices have been operating.
As Andrea pointed out, patent office staff will usually not go to software repositories and read source code in order to find prior art. That’s why making it easy for them to read about what you’ve done in software is necessary. That’s what defensive publications are supposed to do.
The life of patents end in several ways, whichever comes first:
- The patent was filed more than 20 years ago or the patent holders have not paid their yearly patent-taxes, it’s now in the public domain
- an authoritative court decision has striked out the patent as invalid (and there’s no appeal pending)
- the patent office reverts their decision to grant the patent
The problem is that in each of these cases, the process can be quite long. Litigations can go on for several years, especially since a patent holder will probably try to appeal a decision that invalidate its patent.
As for the patent office procedures, they can take a decade. For instance, it took more than 15 years to strike down a single very broad Amazon patent application[^EP0927945].
Meanwhile, the patent will constitute a potential threat that will effectively encumber the use and distribution of your software.
[^EP0927945]: It’s patent EP0927945 The patent’s abstract begins like this: “A method and system for placing an order to purchase an item via the Internet.” This patent was filed at the European Patent Office in 1998.
[^intern]: Since I passed the bar exam in December last year, I now have to fulfil two 6-month internships.
Basically, defensive publications consist in documenting one aspect of software projects that’s focused on solving a challenge and does it in a new, innovative way. The document would give some context about the state of the art and then describe in more details how the system works, usually by using meaningful diagrams, flowcharts and other figures.
And who’s going to read defensive publications? At OIN, we maintain a website to list defensive publications. Then, we submit them to databases used for prior-art examination by patent office examiners. So the target audience for these defensive publications is the patent office that reviews patent applications. A good defensive publication should use generic terms that are understood even by someone who’s not programming in the same language as the one used for the program.
Defensive publications may be no more than a re-arrangement of what’s already written on the project’s blog, or in the documentation. They can be useful to explain how your program works to other programmers. In some aspect, they look like a (short!) scientific publication.
For software that works in areas heavily encumbered with patents like media codecs, actively submitting defensive publications can safeguard the project’s rights against patent holders. For instance, consider that patent trolls now account for 67% of all new patent lawsuits and as shown in a 2012 study, startups are not immune to patent threats.
So part of my job is to work with Free Software projects to help them submit defensive publications. I have been working with Pablo Joubert on a defensive publication around search engines making use of distributed hash tables (DHT). Pablo was involved in the Seeks project and has now started a new project building upon seeks. It was very interesting for me to learn more about how DHT are used in peer-to-peer networks and how we can make use of them for new awesome applications like social search. Now, Pablo also has a document that explains concisely what the project is and how it works. This could be the preamble to the documentation 😉
I’ve also worked on a guide to defensive publications and I am starting to think on how a tutorial might look like. I hope you will find that useful. I’ll write more about that next time!
If you are interested, don’t hesitate to join #linuxdefenders on the IRC freenode server.
I was reading an article by Lorrie Cranor in the MIT Technology Review on how it’s difficult even for her to protect her privacy online.
I appreciate Lorrie Cranor’s work on privacy at Carnegie Mellon University. I have extensively cited her study of the length of privacy policies when I introduced ToS;DR.
However in this article, I was disappointed to see Ghostery mentioned. Ghostery is an browser extension supposed to help users against tracking and surveillance on the web. The main problem is that Ghostery is not released as Free Software[^akaos]
[^akaos]: a.k.a Open Source. Both these terms designate the same set of programs.
Earlier on Twitter I quickly posted my frustration about this. People who promote web privacy should stop promoting Ghostery, as it’s proprietary. What’s their business model exactly?
In my earlier tweet I wrongly stated that the source code was not disclosed; but that’s not accurate. There is some code disclosed (I suppose it’s entirely readable and not obfuscated nor minified). But as you’ll notice, the license is “All rights reserved” so, basically, users have no rights.
Ghostery has been playing on the ambiguity for too long. This hypocrisy must stop. See these tweets from years ago…
this is good news RT @Ghostery: Currently, you can access Ghostery's code if you unpack the ext. We are still looking to open source, too— Jeekajoo (@jeekajoo) May 28, 2013
Wall Street Journal: The encryption flaw that punctured the heart of the Internet this week underscores a weakness in Internet security: A good chunk of it is managed by four European coders and a former military consultant in Maryland.
To answer some of the astonished comments I made yesterday, the lack of contributors to the project is baffling. So: the whole Internet relied on 10 volunteers and 1 employee and nobody helped them?
I guess this sort of comes back to one of the essential question in Free Software: how do you get the users to fund it? For some kind of software, this can be difficult; but in the case of OpenSSL I would have thought this to be an easy thing, since so many banks and web companies intensively rely on it.
But apparently, they didn’t care at all if this major piece of security they were using was able to keep up with security standards or not. Considering the number of people involved with the project, I don’t see how it can put enough scrutiny and efforts to make sure it follows the best security review.
(Now, I have to wonder if the WSJ piece is actually correct in the way it counts the contributors to the project, because it’s fairly possible that lots of companies making use of OpenSSL actually had security experts and developers in-house test the code and send patches and bug reports upstream; a bit like Google and that other security firm did when they found out about Heartbleed…)
According to Brett Simmons, That pretty much wraps it up for C.
The first obvious lesson, is that the communication around the vulnerability was brilliant marketing.
The other lesson, less satisfying, is why is the majority of the internet relying on a very poorly funded project?!
The Washington Post published an article that misses the real issue. The heartbleed debacle is not an issue with the fact that OpenSSL is Free Software (the Apple goto fail bug shows it’s even worse when it’s proprietary—all Apple users had to wait several days before a patch was sent), nor with the fact that the Internet have no single authority (if anything, the openssl library is a single point of failure).
I find it astonishing that OpenSSL is so poorly funded and apparently lacks a governance strategy that includes large stakeholders such as the major websites making use of the library and which, instead, are essentially all irresponsible free-riders.
The real issue here is one of responsibility.
Note: Since I wrote this, it’s possible that the patched kernel now has more features than only touchpad support.
xf86-input-synapticsand, from AUR,
touchegg-gce-git(this last one is to be able to configure gestures with the graphic interface).
Section "InputClass" Identifier "touchpad catchall" Driver "synaptics" MatchIsTouchpad "on" Option "TapButton1" "1" Option "TapButton2" "0" Option "TapButton3" "0" Option "ClickFinger2" "0" Option "ClickFinger3" "0" # This option is recommend on all Linux systems using evdev, but cannot be # enabled by default. See the following link for details: # http://who-t.blogspot.com/2010/11/how-to-meta:ignore-configuration-errors.html MatchDevicePath "/dev/input/event*" EndSection
Configure your gestures with Touchègg
Add to your session (using
The real improvement is that I can use three-finger tapping to simulate the middle-click mouse button which is used for quick pasting or for opening links in a new tab.
As far as “pinching” is concerned, it does not work reliably at all for me.