Why cut-copy-paste is hard

[This note is primarily a response to a recent LWN thread. I'm posting it here because it got a bit long for a discussion forum.]

Several years ago, I joined the X.Org Foundation Board of Directors. One of my campaign pledges was to organize a project to finally "fix" X cut-copy-paste (CCP).

It has long been acknowledged by anyone paying attention that the user experience around CCP on the X desktop is horrible (although it has improved a bit in the last couple of years). I'm smart, I could fix it, right?

Uh, no. Here's some reasons why CCP is hard, especially on X…

  1. Users want a seamless multimedia cut-copy-paste experience. Heck, they expect to CCP audio. This implies being able to identify the media type of CCPed content, including distinguishing direct content from content references. It also implies a good system for coercing and converting media types during CCP.

    The best means we have for identifying media type right now is MIME-types. Unfortunately, they are really too incomplete and disorganized for CCP purposes. Their ontology is only two levels deep and highly incomplete. Heck, they can't even deal with compressed or packaged content reasonably.

    Even for text, UTF-8 doesn't solve the problem. Users expect to be able to CCP formatted text with the formatting preserved. Sometimes. Maybe.

  2. Persistence and related semantics have been an issue since the 128K Mac introduced CCP to the world in the mid-1980's. The Mac had a literal software "clipboard" app, with pages and everything. This was a nice metaphor, but gave little support for the kind of "quick-transfer" CCP that users wanted; the model has since been minimized. At any rate, CCP-ing content that took a substantial fraction of storage was and is a problem for persistent models. Managing long-persisting state is a problem for the user as well—sooner or later, everyone inadvertently pastes something saved and forgotten long ago into a really bad place.

    X took things in a different direction with its "pasting is between apps" semantic. This has the advantage that the transfers are quite efficient and typically require little or no extra storage. It has the disadvantage that when an app exits its state is no longer available for pasting. Although this protects the user from long-gone state.

    So we have two models, both of which have problems and issues. Awkward.

  3. Users have strong UI expectations around CCP. These expectations are similar but not the same between Mac, Windows and X desktop users. There's a resulting "uncanny valley" effect that's only exacerbated by the fact that each of these platforms (but especially X) has individual applications that also behave a little different from the norm.

    On the Mac, CCP behavior is at least codified by the Mac HIG. I don't know for sure the situation for Windows, but I've never found anything that looked definitive from Microsoft. On X, Gnome has an HIG document that says some things about CCP, but they're pretty incomplete and not too useful. Worse yet, X has two almost-but-not-quite-peacefully coexisting CCP models. Old people like me expect one; people coming from Windows or Mac expect the other.

    For example, how many selections should be allowed to be active (ready-to-CCP) at the same time? One? One per display? One per application? One per window? When the user tries to copy, which one should be the source?

    Another example: Does your favorite app support making selections using the keyboard? Do different apps handle this slightly differently? (Hing: yes.)

    Standardizing UI behavior that nobody knows about or agrees on is hard.

  4. CCP is full of state. This is related to the points above, but different. To get a solid specification, you really want to specify states and state transitions of the whole CCP environment. "When the selections are like this, and the clipboard is like this, and the user performs this CCP action, then here's what happens."

    Such specifications are notoriously hard to write. To my knowledge, it has never been done for CCP.

  5. Imagine that you have the perfect solution to the above problems. You know exactly what should be done, and everybody who matters agrees with you. (This should take about…one day, maybe two. Right?)

    Now what? It has taken 10 years and full backward-compatibility to start to get XCB into the world as a replacement for Xlib. By the time we're close to being done, if ever, X will be long dead.

    CCP is embedded in everything. Most people who have a "working" implementation will not bother to upgrade it just because yours is better. You probably have to wait for the turnover of every application on the desktop, at least to a new major version.

    It has been only a few years since Xemacs abandoned a CCP scheme that was outdated 20 years ago. Why? Because no one could be bothered, as far as I can tell.

    You of course will have to have an easy-to-use library. It will have to provide a C interface, since that's the closest thing to a lingua franca that we have for the desktop in 2010. But this won't be enough. You'd better make it easy for the common toolkits on your desktop—at least Qt and Gtk for X—by providing even more mechanism. Don't even think about going without some kind of browser support and JavaScript library support; client-side JavaScript has no access to your C library.

    You're getting started now. Barely. Enjoy.

Of course, X CCP could be vastly improved. A "CCP strike force" would be a good idea, and I would support it. However, it would be a lot of slow, incremental work, without much of any payoff for the participants in terms of glory, wealth or even user acknowledgment. Until I find a band of folks excited by that description, I think I'll spend my time doing other things. Friend of Bart

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Examples of CCP that illustrate some comlications

Real world and recent examples, for all of these:

1) I'm looking at a document in my web browser. I copy a bit of text from the current page. The text happens to be richly formatted.

In instance a) I want to paste this into a terminal, and execute it as a command, preserving only the utf8/ascii text

In instance b) I want to paste this into a word processor, preserving both the text and it's formatting

In instance c) I want to paste this into a diagram editor as a label. Should formatting be preserved or not?

2) I'm copying a cell containing a formula and displaying a calculated value from a spreadsheet. The calculated value, say a date, has a display format and text style. (The same problems apply to money, btw)

In instance a) I want to paste the date as displayed into an html email in t-bird preserving both display format and text style.

In instance b) I want to paste the date as displayed into a plain text email in t-bird, preserving the display format but not any style information

In instance c) I want to paste the formula into a different spreadsheet application

In instance d) I want to paste the displayed date as text into a different cell in the same spreadsheet

Yah, i can definitely see it being hard to get this right, or even mostly right

-- Pat

Requirements are always hard

Thanks for the nice examples! I'd argue that these are situations where it's clear that the standard specification of CCP is incomplete: it doesn't tell the user what will happen. Mostly, I think the user would be happy enough with any predictable behavior, especially if it's maximally information-preserving. What drives the user nuts is getting a behavior that wasn't expected, then being unable to correct the behavior. Friend of Bart

All that is solved by

All that is solved by providing multiple formats. Which and how many depends on the program. Then the program pasting it can choose the format that fits best. When in doubt it can ask the user to choose between multiple formats it understands, but usually it's clear from the context.

This is something you can't and shouldn't specify, because it depends on the context and data formats involved. And data formats and contexts change, the specification shouldn't.

Indan

Yes, user input is sometimes needed

If you look at Excel and its various clones and workalikes, for example, you'll see a lot of options for pasting that have to do with how the paste should be handled; see in particular the "Paste Special" commands.

However, many programs offer a lot of possible source formats for a given copy command, and so there are still issues. Assume that the user has somehow successfully indicated what they want to do, and assume that there are multiple source formats to support this.

First of all, one problem with a persistent CCP model like Mac and Windows use is that potentially the app has had to leave all those formats on the clipboard. If the object is a giant image copied from Gimp, for example, that is a lot of clipboard storage. Presumably, one could get by with a single lossless format, as long as it was truly lossless, but this would require some sort of ontology to decide that there need be only one, and to choose which one it is. For lossy formats, they each lose in different ways, so it's hard to pick a winner here.

Things are better in the X non-persistent model, but some kind of ontology is still desirable; once the user has indicated what the destination format should be, they probably want the system to work out which of the offered source formats will be best for this. This is still hard.

In any case, deciding what the source and destination programs should "understand" is hard. As noted above, MIME types really aren't sufficient for many common cases in practice. Apparently Apple has something better, but I haven't looked at it much.

Thanks much for your comments! Friend of Bart

Windows

Is Windows' model totally persistent? I used to use Corel Photopaint a lot back when Windows was my main desktop, and I recall that even when you copied a large image, it was pretty quick. And if you closed the program after copying the image, it would ask you whether it was OK to discard it or whether you wanted the data left on the clipboard for other applications.

I always assumed based on that that it had a hybrid model, where regular copies used an X-style system where the clipboard could just say "this application has something copied, ask it if you want to paste", but there was also the option to hand data off to the system persistently if you wanted to.

I don't think the prompt was a good design (how many users ever read it? How many understood it?) -- but the fundamental idea seemed sensible.

As well as simply storing

As well as simply storing multiple formats on the clipboard, Windows allows deferring storing some formats until needed by calling SetClipboardData with a NULL data argument. If that format is needed then the application is asked to render it.

http://msdn.microsoft.com/en-us/library/ms649014(v=VS.85).aspx#_win32_Delayed_Rendering

I think Windows CCP was entirely persistent

The prompt you mention occurred as far back as the Mac 128K. As far as I know, it was never intended to indicate an alternate copy mechanism, but merely to let you know that you might be leaving a humongous pile of droppings on the persistent clipboard, and give you a chance not to do that.

I'm not an expert, but I don't think Windows has any non-persistent clipboard mechanism; multitasking is too recent there. Indeed, this is why it happened originally on X, but not on Mac or Windows; X was already operating on top of multitasking UNIX, without which there's no real graceful way to do handoffs. Note that the Mac could always hand off to applets IIRC—that's how they got the clipboard to work.

Corrections welcome. Friend of Bart

Windows's CCP does seem to be something of a hybrid...

Although you can dump data directly to it, it has owner-render and delayed rendering format modes that only actually copy/process-on-request. If the window attached to the clipboard tries to close, Windows will ask it to render any delayed items - if it doesn't, those will be automatically removed from the clipboard's 'available' data format list.

I'd make a decent bet (haven't dealt with it myself) that you could add an extra format tag for your app alone that notes that you're pasting from yourself, so it may never need to actually "render" to the global clipboard itself in that situation.

See http://msdn.microsoft.com/en-us/library/ms649014%28v=VS.85%29.aspx and related pages for more information.

Thanks much for the information!

The URLs given above were quite informative, thanks!

So Microsoft solves the persistence problem in a semi-sane way as long as the supplying application doesn't put any giant formats fully on the clipboard and both applications are up at the time of the transfer.

Note that Microsoft's spec requires an exiting application to "place valid memory handles on the clipboard for all clipboard formats that it provides. This ensures that these formats remain available after the clipboard owner is destroyed." Thus, the persistence problem post-source-exit remainds. I haven't found any real guidance about what formats need to remain, although I haven't looked too hard yet.

In any case, this looks like the sort of model X could adopt pretty painlessly. With a proper ontology, one could even leave only one format on the clipboard in the normal case, which would make it plausible given modern disk and memory sizes in relation to application object sizes. (Please don't copy your database and exit. Friend of Bart)

Interesting stuff. Friend of Bart

The key thing is to keep

The key thing is to keep things simple, yet expressive enough and extensible enough so that others can do crazy things with it without making the base system more complex. That inherently also means that applications can mess it up.

I'm just a computer scientist who likes designing systems and solving problems. Currently I'm working as a freelance system programmer. I'm interested in the Linux kernel, networking, security, hardware and helping/teaching people.

Regards,

Indan Zupancic (i3839)

i3839's reply

What I'm talking about is implementing a copy & paste system from a technical point of view. That is trivial and takes a few days of code writing.

Introducing this and letting all applications handle all the complex types they want is not something trivial and done within a few days. That is indeed hard, depending on how difficult the apps make it themselves. But this doesn't affect the copy & paste system itself much.

To repeat, doing good copy & paste may be hard for individual programs, but making a good copy & paste system is trivial.

  1. "It also implies a good system for coercing and converting media types during CCP."

    No, it doesn't. This is the worst mistake you can make.

    For the rest, it's handled by giving the ability to supply multiple types, see my reply on LWN. That still keeps the CP system simple and pushes the complexity where it should be, while being flexible enough.

  2. Perhaps awkward, but it's still trivial. It doesn't change the interface I proposed. Either behaviour can be implemented with the same interface. Not a real problem.

  3. Not technical problems, just platform differences. It doesn't change the interface much. It would be nice to standardize UI behaviour, but that's something that should be done at the UI level, whatever the copy & paste system is it can't help with this. It only gives a way to copy and paste data, it doesn't say what or when to do it. And it shouldn't.

  4. It's only full with states if you make it unnecessarily complicated, like having two copy & paste systems like X has. In my proposal you'd have a couple functions to copy and paste stuff (one for simple utf8 text, one for a list of complex types). From the application's point of view that's it. The state is either "no clipboard content" or "some clipboard content".

  5. This is just politics and nothing technical. But see how the discussion started. All I said is that you don't need X for copy & paste and that implementing another decentralized system is trivial.

I'm still utterly unconvinced that there's anything non-trivial about implementing a copy & paste system. All the complexity of handling complex types is solved in the only place it can be solved: By the applications using them. If you try to push this into the copy & paste system then you get madness.

Still thinking cut-copy-paste is hard

Thanks much for the detailed reply here!

I think we can agree to disagree on how much of what needs to be part of the cut-copy-paste system. Like I say, I've tried to go there in the past, which is what motivated my original comments. Maybe I'm just not bright enough to figure out how to do it.

I've told you a little about my background; I'm really curious about yours. Anything you'd like to share?