So I made an API request for 37430, which is apparently Gordon Tootoosis, the request looked like this: http://api.themoviedb.org/3/person/37430?api_key=xxxx&append_to_response=external_ids,images,tagged_images (This was in November)
The return JSON looked like this:
{"birthday":"1941-10-25","tagged_images":{"results":[],"page":1,"total_results":0,"id":37430,"total_pages":0},"deathday":"2011-07-05","id":37430,"external_ids":{"freebase_id":"\/en\/gordon_tootoosis","instagram_id":null,"tvrage_id":6424,"twitter_id":null,"freebase_mid":"\/m\/08h8lr","imdb_id":"nm0867588","facebook_id":null},"name":"Gordon Tootoosis","images":{"profiles":[{"iso_639_1":null,"aspect_ratio":0.667107001321,"vote_count":0,"height":757,"vote_average":0,"file_path":"\/l10aLDp4D8ZwWdmfRdlj6FedUYk.jpg","width":505},{"iso_639_1":null,"aspect_ratio":0.6675567423231,"vote_count":0,"height":749,"vote_average":0,"file_path":"\/924xXlQnyah5S7IXy89UMdzXXsh.jpg","width":500}]},"also_known_as":[],"gender":2,"biography":"","popularity":0.098449,"place_of_birth":"Poundmaker Reserve, Saskatchewan, Canada","profile_path":"\/924xXlQnyah5S7IXy89UMdzXXsh.jpg","adult":false,"imdb_id":"nm0867588","homepage":null}
The problem is the 924xXlQnyah5S7IXy89UMdzXXsh.jpg image, the one which seems to be the default, is actually the image for Fergal Reilly (52701) I know this because it caused a Duplicate Entry violation. The data, looking at the IDs, should have been gotten within 24hrs of each other.
Any idea how this happened?
The erroneous image now seems to have gone, but I find it hard to believe that it could have gotten there via user error. It looks like something more underlying (if someone had uploaded the wrong image, it would have had a unique filename, it wouldn't have had the filename for someone else's image).
I ask this, as I am wondering whether there is going to be a rash of this in the data collected from November.
Non riesci a trovare un film o una serie Tv? Accedi per crearlo.
Vuoi valutare o aggiungere quest'elemento a una lista?
Non sei un membro?
Risposta da Adi
il 23 febbraio, 2018 alle 12:53PM
Why is there no Unique constraint on the image field? Surely there isn't a viable reason for having two people with the same photo?
So, for James Heath, and this is still the case, we have these TMDB IDs:
63256
63246
63244
Those are old IDs.
All with the same profile image of: /azRn7U2RKTkB9cHBO4GwJZm2jxy.jpg
They are all basically the same, but the last one of the 3 has a IMDB ID.
Risposta da Adi
il 23 febbraio, 2018 alle 1:00PM
So here are the duplicates. This feels highly avoidable to be honest:
4 33781, 231784, 262075, 1104340
4 25348, 139567, 161310, 1153503
3 976019, 990654, 1301102
3 63244, 63246, 63256
2 94938, 572394
2 129814, 1884930
2 222548, 222549
2 1033001, 1866767
2 1791571, 1791647
2 251202, 1405869
2 1178411, 1880465
2 1624370, 1624372
2 555778, 1070406
2 1020725, 1070133
2 19828, 1462120
2 237775, 1483240
2 46391, 127279
2 1499236, 1886106
2 932097, 1234191
2 1418435, 1418437
2 179942, 1216483
2 1561979, 1561980
2 1833297, 1833299
2 1405685, 1612728
2 1479456, 1905843
2 588716, 1575014
2 239025, 1091885
2 565339, 1911757
2 1172683, 1676265
2 1405687, 1612729
2 1517607, 1523644
2 131208, 137626
2 148084, 260050
2 572045, 572046
2 230712, 932081
2 1747946, 1747947
2 564053, 1404608
2 1165435, 1165436
2 148108, 148109
2 145086, 145087
2 560243, 560244
2 1024232, 1883402
2 1553273, 1572416
2 18906, 1403158
2 37986, 1489580
2 1816564, 1849793
2 74296, 1144947
2 1120110, 1908406
2 1157333, 1157335
2 1405690, 1612727
2 56900, 143817
2 1813034, 1813041
2 189884, 930147
2 88471, 107221
2 224886, 994322
2 131605, 131606
2 228802, 586259
2 16609, 975287
2 1157303, 1222106
2 1155607, 1155608
2 224462, 1342697
2 1067293, 1067296
2 1130836, 1259905
2 580219, 1463264
2 33608, 1127849
2 144103, 1213844
2 260627, 1339312
2 1405691, 1612726
2 113387, 1216756
So yeah, on a few occasions, 4 entries for people, have the same image... Is this moderating gone wrong?
Risposta da Travis Bell
il 23 febbraio, 2018 alle 2:21PM
Hey Adi,
The image service is only keyed by the file's SHA and has no link to the asset it belongs to. That link only exists in the media database, so yes, it would be possible to have the same SHA belong to more than one media record. It makes sense you're seeing this in and around duplicate records.
Once an image is uploaded to S3, it is never removed. Since it's keyed by the SHA, if that file were to get uploaded again, it's essentially a no-op. Nothing happens. Same SHA, means same image which means many records could theoretically be tagged with it.
Risposta da Adi
il 23 febbraio, 2018 alle 2:31PM
Makes you wonder what happened with Gordon Tootoosis / Fergal Reilly, they don't look similar :P
Nice move with the SHA.
Worth adding something which indicates where it is already in use? (Stopping people from using it doesn't help, since they will just upload a different image of the same person for their duplicate entry, which isn't helpful.)
Risposta da Travis Bell
il 25 febbraio, 2018 alle 12:00PM
Haha, I have stopped trying to figure out what users do sometimes.
It could be but it might not change a users behaviour much, as like you said, they'd probably ignore it anyways. And I would prefer to keep the merging/editing of profiles easier by not restricting duplicates (ie. so a mod or user can just re-add the image) and then the mod can just click delete without worrying about anything else.
Risposta da Adi
il 26 febbraio, 2018 alle 12:33AM
Yeah, no dupes only hinders the mods without actually helping with the problem when it comes to user behaviour.