As OS updates sneak into our smartphones, uninvited and undetected, some new features make their appearances and surprise users. AI, it seems, is getting a firmer grip on photographic practises.
Among the features that awaited me on my recent trip to the UK was one abysmal “Live” thingy that appears to capture images before the user clicks and creates unsollicited mini-videos instead of just the desired photograph. Not only does this feel creepy and invasive to me, it also gobbles up tons of memory (therefore natural resources) and I struggle to imagine a scenario where this would be of any use to me …
Far more intersting, and even less expected, is a recommended framing guide that pops-up automatically as soon as the phone is held steady in photo mode for a second or two. Instead of aggravation, this one produced amused puzzlement 😉
And so many questions … Let’s dig deeper
When my senile brain finally understood what the “Best framing” spot in the middle of the frame meant (I tried clicking it, feeling really past my prime for a few baffled seconds 😉 😉 😉 ) the idea amused me a lot. Maybe to compensate for my lack of mental speed, my amusement at the feature was quite smug. As if a phone was going to tell me, the great PJ, winner of absolutely every competition I entered (zero, that’s correct), and favourite photographer of my mummy, how to frame. Pshhh
But, that soon faded because my twin interests in composition and in artificial intelligence. So I decided to grab some pairs of shots: first my idea of the best shot, then the camera’s, to review them and present them to you, in order to decide how well AI was doing. The first pairing is above.
Both are valid. But I kinda prefer the phone’s more tranquil version. Ouch 😉
Above is another pair. There’s a slight difference here. The AI version is probably what would get published easier, but my vote goes to mine, by a slight margin. What’s starting to emerge in my mind after inspecting a few shots is that the AI seems to be applying “rules”, in particular the @$#ffing rule of thirds 😉 How?
OK, so AI is any technology that allows automatic human-like processing faculties to be performed by a machine. Such as speech, vision, mapping your way from A to B … But, most often, today, the term AI is (wrongly) used to talk about machine learning, a subset of AI that learns patterns from examples. Whenever a specification can be written, a developer can write a program that will produce exact results in accordance to the specification. But, in real life, very few problems actually lend themselves to accurate specifying.
How, for example, do you specify the action of finding a cat in a photograph? You count pointy ears, whiskers, round eyes … But what if the cat is turning it’s back to you? You’ll probably only spot one round feature, no ears and a long fluffy thing that’s not described in the specifications for a face. What if the cat is lying down on its back like an otter waiting for its stone? 😉 Plus, even if you constrain your task to identifying head-on cats, how do you identify a nose, eyes, whiskers, ears …? We don’t have specifications for those either. You could try one for identifying eyes based on the round shape. But what if that cat’s eyes are half shut, what if its pupils are tiny, or huge? What is there’s a strong reflection on the eyes? What if one eye is hidden behind a leaf? Is that cat no longer a cat? Nope, real life problems can rarely be specified accurately enough for exact software to be written.
So we write inexact software that gets it right often enough.
Because it’s hard enough to write software that meets a spec, imagine writing software based on no spec at all … Impossible. What machine learning teams do instead is provide a set of examples (of photos of cat, for example), a set of negative examples (pics of dogs, humans dressed as cats at a child’s birthday …) and some sort of decision process that answers Yes or No when fed a new example (such as a new image that wasn’t in the training set). The whole trick is to train this decision process (called the model) as we’d train a kid. Simple models are regression lines (a line that could split data points based on the number of legs they have, for example) or neural networks (which are matrices of regression “lines” (that are not straight) and have multiple inputs, a bit like actual neurons).
Training the model is an optimisation process, and the most interesting part of the whole process, if you ask me. Back in an other millenium, I did my PhD on the mathematical models behind those optimisation processes. Very interesting progress has been made since, in the way models can train themselves by playing simulations and evaluating the outcome of each. This is how the programs that beat grand master at chess and go were trained. They played against themselves, under the supervision of a brilliant team of developpers and learned as they played. It’s also how helicopters learn to fly themselves and perform manoeuvers that no human could achieve. It’s impressive and brilliant stuff.
But nothing so complex is at play in smart-cameras, I’m guessing.
What I’m seeing in most of the AI suggested framing in a strong reliance on the rule of third. And it works superbly. My guess is the team picked a large set of photographs heavily dominated by a rule of thirds composition and declared them to be good examples. Then another set of ‘other’ compositions favouring the center or something else – the sort I would do 😉 – and declared those to be negative examples. They then obtained a model that detects the main masses of colour and suggests a center spot that creates a rule of third composition from those, then uploaded it to the photo app in phones.
Let’s be honest, it works brilliantly. In most cases, my initial nod goes to the camera’s suggested framing! On second viewing, I go back to some of mine, but not all of them! Sometimes the recommended framing is very close to my original idea, sometimes it departs from it significantly.
For the record, AI isn’t composing. I am, by placing myself in a specific spot and choosing the focal length. Nor it is correcting sloping horizons … I still have to do that in post (I didn’t in the examples, so as not to crop into the frame), for now. But AI is finding interesting framing, from the chosen spot. And that’s great already. Not failsafe, but definitely worth giving a try. What baffles me is how the camera does this, beyond the model recommendation part, which seems quite straightforward.
How does the camera “know” what data lies outside the frame that I’m using for my own composition? It has to know since it is recommending that some of it be included in the final photograph.
My guess is, as in the first example, that the camera starts scanning as soon as it detects that it’s going to be used (as it is being raised at eye level). In a way, this is no less creepy than in the “live” gizmo. But the result subjectively feels like an interesting little tool rather than an invasion of my privacy. Maybe it’s just me 😉
Two aspects of this fun innovation are particularly interesting to me.
First of all, my style hinges on clean edges and balanced weights (unless there’s a reason to introduce imbalance, such as in Shark attack). I was trained by books written by traditional guys such as Ansel Adams and Charlie Waite, and like it that way. But this can get stale and the AI version forces me to rethink and double check.
Plus, I tend to be quite lazy with the phone, mainly recording a memory with what appears to be a correct composition, opting to PP my way out of a mess whenever disaster strikes 😉 And this little tool forces me to be a little more careful. Instinctively, there’s always this little dialog going on at the back of my brain “hang one, why is the thingy not aligned with the center of my frame? What am I messing up now?” 😉
It is giving me options, not imposing itself. I love that.
The second aspect is related to the growing chasm between traditional cameras and smart-cameras (let’s call them that, rather than phone cameras).
For the millionth time, why are camera makers still head butting in a pixel race that was obsolte ten years ago, when phones have not only (largely) caught up in purely qualitative terms but are also providing really interesting and useful features. It’s like traditional manufacturers are begging to lose their market share …
For a long time, the technology narrative with regards to human work has been that robots replaced humans because they need no sleep, don’t hurt their backs, don’t go on strike … and that what jobs have been replaced by machines will always compensated for by more valuable jobs and higher qualifications. Blue collars suffered initially, but more white collars made a better living. But then, machines started doing intelligent work, not just manual labour. And now, phone cameras are making artistic decisions while expensive photo cameras are still comparing pixel muscle in the children’s courtyard. This can only go one way …
My phone is over 3 years old and new ones incorporate far better cameras than mine. But there’s still enough of a quality difference between your [insert brand] high-quality APS-C or FF camera and modern smart-cameras to consider going traditional. However, I don’t give that advantage more than 5 years to fade away in 90% of scenarios. Add to that the innumerable tools such as auto online backup, easy sharing … and the reasons to buy a traditional camera are looking more and more transparent.
Smart cameras are fast and fun to use. They are easy to use. Their ergonomics, not great at the best of times, slowly improve. They come with a huge variety of side applications, such as a phone, a GPS, note taking, mapping, … all immensely useful to a photographer.
I do hope someone gets their head out of the sand and understands how useful it would be to have those features on a traditional camera or how fun it would be to depart from the traditional recipe using different sensors or a different approach. In the mean time, smart cameras: Keep pushing, please keep pushing!
What say you?
Never miss a post
Like what you are reading? Subscribe below and receive all posts in your inbox as they are published. Join the conversation with thousands of other creative photographers.
#600. The Monday Post (29 May 2017) – “You’re hurting me, Dave”
#589. The Un-labor Monday post (1 May 2017) – The Terminator and Maslow duke it out
#1143. Death to the Pixii? (Aug 2021 review)
#1098. Laowaaaaaah fun!!
#1075. The vanity lens. Or is it? The truly excellent Laowa 15mm f:2.0, a.k.a. Gargantua
#1041. Important Camera Review : Pixii dust for the industry?
Please log in again. The login page will open in a new tab. After logging in you can close it and return to this page.