So I made an API request for 37430, which is apparently Gordon Tootoosis, the request looked like this: http://api.themoviedb.org/3/person/37430?api_key=xxxx&append_to_response=external_ids,images,tagged_images (This was in November)
The return JSON looked like this:
{"birthday":"1941-10-25","tagged_images":{"results":[],"page":1,"total_results":0,"id":37430,"total_pages":0},"deathday":"2011-07-05","id":37430,"external_ids":{"freebase_id":"\/en\/gordon_tootoosis","instagram_id":null,"tvrage_id":6424,"twitter_id":null,"freebase_mid":"\/m\/08h8lr","imdb_id":"nm0867588","facebook_id":null},"name":"Gordon Tootoosis","images":{"profiles":[{"iso_639_1":null,"aspect_ratio":0.667107001321,"vote_count":0,"height":757,"vote_average":0,"file_path":"\/l10aLDp4D8ZwWdmfRdlj6FedUYk.jpg","width":505},{"iso_639_1":null,"aspect_ratio":0.6675567423231,"vote_count":0,"height":749,"vote_average":0,"file_path":"\/924xXlQnyah5S7IXy89UMdzXXsh.jpg","width":500}]},"also_known_as":[],"gender":2,"biography":"","popularity":0.098449,"place_of_birth":"Poundmaker Reserve, Saskatchewan, Canada","profile_path":"\/924xXlQnyah5S7IXy89UMdzXXsh.jpg","adult":false,"imdb_id":"nm0867588","homepage":null}
The problem is the 924xXlQnyah5S7IXy89UMdzXXsh.jpg image, the one which seems to be the default, is actually the image for Fergal Reilly (52701) I know this because it caused a Duplicate Entry violation. The data, looking at the IDs, should have been gotten within 24hrs of each other.
Any idea how this happened?
The erroneous image now seems to have gone, but I find it hard to believe that it could have gotten there via user error. It looks like something more underlying (if someone had uploaded the wrong image, it would have had a unique filename, it wouldn't have had the filename for someone else's image).
I ask this, as I am wondering whether there is going to be a rash of this in the data collected from November.
Can't find a movie or TV show? Login to create it.
Want to rate or add this item to a list?
Not a member?
Reply by Adi
on Februari 23, 2018 at 12:53 PM
Why is there no Unique constraint on the image field? Surely there isn't a viable reason for having two people with the same photo?
So, for James Heath, and this is still the case, we have these TMDB IDs:
63256
63246
63244
Those are old IDs.
All with the same profile image of: /azRn7U2RKTkB9cHBO4GwJZm2jxy.jpg
They are all basically the same, but the last one of the 3 has a IMDB ID.
Reply by Adi
on Februari 23, 2018 at 1:00 PM
So here are the duplicates. This feels highly avoidable to be honest:
4 33781, 231784, 262075, 1104340
4 25348, 139567, 161310, 1153503
3 976019, 990654, 1301102
3 63244, 63246, 63256
2 94938, 572394
2 129814, 1884930
2 222548, 222549
2 1033001, 1866767
2 1791571, 1791647
2 251202, 1405869
2 1178411, 1880465
2 1624370, 1624372
2 555778, 1070406
2 1020725, 1070133
2 19828, 1462120
2 237775, 1483240
2 46391, 127279
2 1499236, 1886106
2 932097, 1234191
2 1418435, 1418437
2 179942, 1216483
2 1561979, 1561980
2 1833297, 1833299
2 1405685, 1612728
2 1479456, 1905843
2 588716, 1575014
2 239025, 1091885
2 565339, 1911757
2 1172683, 1676265
2 1405687, 1612729
2 1517607, 1523644
2 131208, 137626
2 148084, 260050
2 572045, 572046
2 230712, 932081
2 1747946, 1747947
2 564053, 1404608
2 1165435, 1165436
2 148108, 148109
2 145086, 145087
2 560243, 560244
2 1024232, 1883402
2 1553273, 1572416
2 18906, 1403158
2 37986, 1489580
2 1816564, 1849793
2 74296, 1144947
2 1120110, 1908406
2 1157333, 1157335
2 1405690, 1612727
2 56900, 143817
2 1813034, 1813041
2 189884, 930147
2 88471, 107221
2 224886, 994322
2 131605, 131606
2 228802, 586259
2 16609, 975287
2 1157303, 1222106
2 1155607, 1155608
2 224462, 1342697
2 1067293, 1067296
2 1130836, 1259905
2 580219, 1463264
2 33608, 1127849
2 144103, 1213844
2 260627, 1339312
2 1405691, 1612726
2 113387, 1216756
So yeah, on a few occasions, 4 entries for people, have the same image... Is this moderating gone wrong?
Reply by Travis Bell
on Februari 23, 2018 at 2:21 PM
Hey Adi,
The image service is only keyed by the file's SHA and has no link to the asset it belongs to. That link only exists in the media database, so yes, it would be possible to have the same SHA belong to more than one media record. It makes sense you're seeing this in and around duplicate records.
Once an image is uploaded to S3, it is never removed. Since it's keyed by the SHA, if that file were to get uploaded again, it's essentially a no-op. Nothing happens. Same SHA, means same image which means many records could theoretically be tagged with it.
Reply by Adi
on Februari 23, 2018 at 2:31 PM
Makes you wonder what happened with Gordon Tootoosis / Fergal Reilly, they don't look similar :P
Nice move with the SHA.
Worth adding something which indicates where it is already in use? (Stopping people from using it doesn't help, since they will just upload a different image of the same person for their duplicate entry, which isn't helpful.)
Reply by Travis Bell
on Februari 25, 2018 at 12:00 PM
Haha, I have stopped trying to figure out what users do sometimes.
It could be but it might not change a users behaviour much, as like you said, they'd probably ignore it anyways. And I would prefer to keep the merging/editing of profiles easier by not restricting duplicates (ie. so a mod or user can just re-add the image) and then the mod can just click delete without worrying about anything else.
Reply by Adi
on Februari 26, 2018 at 12:33 AM
Yeah, no dupes only hinders the mods without actually helping with the problem when it comes to user behaviour.