So I made an API request for 37430, which is apparently Gordon Tootoosis, the request looked like this: http://api.themoviedb.org/3/person/37430?api_key=xxxx&append_to_response=external_ids,images,tagged_images (This was in November)
The return JSON looked like this:
{"birthday":"1941-10-25","tagged_images":{"results":[],"page":1,"total_results":0,"id":37430,"total_pages":0},"deathday":"2011-07-05","id":37430,"external_ids":{"freebase_id":"\/en\/gordon_tootoosis","instagram_id":null,"tvrage_id":6424,"twitter_id":null,"freebase_mid":"\/m\/08h8lr","imdb_id":"nm0867588","facebook_id":null},"name":"Gordon Tootoosis","images":{"profiles":[{"iso_639_1":null,"aspect_ratio":0.667107001321,"vote_count":0,"height":757,"vote_average":0,"file_path":"\/l10aLDp4D8ZwWdmfRdlj6FedUYk.jpg","width":505},{"iso_639_1":null,"aspect_ratio":0.6675567423231,"vote_count":0,"height":749,"vote_average":0,"file_path":"\/924xXlQnyah5S7IXy89UMdzXXsh.jpg","width":500}]},"also_known_as":[],"gender":2,"biography":"","popularity":0.098449,"place_of_birth":"Poundmaker Reserve, Saskatchewan, Canada","profile_path":"\/924xXlQnyah5S7IXy89UMdzXXsh.jpg","adult":false,"imdb_id":"nm0867588","homepage":null}
The problem is the 924xXlQnyah5S7IXy89UMdzXXsh.jpg image, the one which seems to be the default, is actually the image for Fergal Reilly (52701) I know this because it caused a Duplicate Entry violation. The data, looking at the IDs, should have been gotten within 24hrs of each other.
Any idea how this happened?
The erroneous image now seems to have gone, but I find it hard to believe that it could have gotten there via user error. It looks like something more underlying (if someone had uploaded the wrong image, it would have had a unique filename, it wouldn't have had the filename for someone else's image).
I ask this, as I am wondering whether there is going to be a rash of this in the data collected from November.
Un film, une émission télévisée ou un artiste est introuvable ? Connectez-vous afin de créer une nouvelle fiche.
Vous souhaitez évaluer ou ajouter cet élément à une liste ?
Pas encore membre ?
Réponse de Adi
le 23 février 2018 à 12h53
Why is there no Unique constraint on the image field? Surely there isn't a viable reason for having two people with the same photo?
So, for James Heath, and this is still the case, we have these TMDB IDs:
63256
63246
63244
Those are old IDs.
All with the same profile image of: /azRn7U2RKTkB9cHBO4GwJZm2jxy.jpg
They are all basically the same, but the last one of the 3 has a IMDB ID.
Réponse de Adi
le 23 février 2018 à 13h00
So here are the duplicates. This feels highly avoidable to be honest:
4 33781, 231784, 262075, 1104340
4 25348, 139567, 161310, 1153503
3 976019, 990654, 1301102
3 63244, 63246, 63256
2 94938, 572394
2 129814, 1884930
2 222548, 222549
2 1033001, 1866767
2 1791571, 1791647
2 251202, 1405869
2 1178411, 1880465
2 1624370, 1624372
2 555778, 1070406
2 1020725, 1070133
2 19828, 1462120
2 237775, 1483240
2 46391, 127279
2 1499236, 1886106
2 932097, 1234191
2 1418435, 1418437
2 179942, 1216483
2 1561979, 1561980
2 1833297, 1833299
2 1405685, 1612728
2 1479456, 1905843
2 588716, 1575014
2 239025, 1091885
2 565339, 1911757
2 1172683, 1676265
2 1405687, 1612729
2 1517607, 1523644
2 131208, 137626
2 148084, 260050
2 572045, 572046
2 230712, 932081
2 1747946, 1747947
2 564053, 1404608
2 1165435, 1165436
2 148108, 148109
2 145086, 145087
2 560243, 560244
2 1024232, 1883402
2 1553273, 1572416
2 18906, 1403158
2 37986, 1489580
2 1816564, 1849793
2 74296, 1144947
2 1120110, 1908406
2 1157333, 1157335
2 1405690, 1612727
2 56900, 143817
2 1813034, 1813041
2 189884, 930147
2 88471, 107221
2 224886, 994322
2 131605, 131606
2 228802, 586259
2 16609, 975287
2 1157303, 1222106
2 1155607, 1155608
2 224462, 1342697
2 1067293, 1067296
2 1130836, 1259905
2 580219, 1463264
2 33608, 1127849
2 144103, 1213844
2 260627, 1339312
2 1405691, 1612726
2 113387, 1216756
So yeah, on a few occasions, 4 entries for people, have the same image... Is this moderating gone wrong?
Réponse de Travis Bell
le 23 février 2018 à 14h21
Hey Adi,
The image service is only keyed by the file's SHA and has no link to the asset it belongs to. That link only exists in the media database, so yes, it would be possible to have the same SHA belong to more than one media record. It makes sense you're seeing this in and around duplicate records.
Once an image is uploaded to S3, it is never removed. Since it's keyed by the SHA, if that file were to get uploaded again, it's essentially a no-op. Nothing happens. Same SHA, means same image which means many records could theoretically be tagged with it.
Réponse de Adi
le 23 février 2018 à 14h31
Makes you wonder what happened with Gordon Tootoosis / Fergal Reilly, they don't look similar :P
Nice move with the SHA.
Worth adding something which indicates where it is already in use? (Stopping people from using it doesn't help, since they will just upload a different image of the same person for their duplicate entry, which isn't helpful.)
Réponse de Travis Bell
le 25 février 2018 à 12h00
Haha, I have stopped trying to figure out what users do sometimes.
It could be but it might not change a users behaviour much, as like you said, they'd probably ignore it anyways. And I would prefer to keep the merging/editing of profiles easier by not restricting duplicates (ie. so a mod or user can just re-add the image) and then the mod can just click delete without worrying about anything else.
Réponse de Adi
le 26 février 2018 à 00h33
Yeah, no dupes only hinders the mods without actually helping with the problem when it comes to user behaviour.