Is the anothe BBBS or Mystic system "in the loop" ?
Is the anothe BBBS or Mystic system "in the loop" ?
Is the anothe BBBS or Mystic system "in the loop" ?
I'm getting a lot of dupes destined for the COOKING echo, but D'Bridge catches most of them and tosses them into the BADECHO folder. I have
236 dupes since December, 2017. The totality of the ones before that
I erased. There had to be thousands of them.
Is the anothe BBBS or Mystic system "in the loop" ?
Possibly. They're all marked with @RESCANNED kludges.
that's a different problem... messages so marked should not be packaged and sent to other links...
that's a different problem... messages so marked should not be
packaged and sent to other links...
It's more that the last one was also triggered by a rescan. Not specifically where it was rescanned from.
On 2018 Jun 19 04:02:48, you wrote to me:
that's a different problem... messages so marked should not be
packaged and sent to other links...
It's more that the last one was also triggered by a rescan. Not specifically where it was rescanned from.
my point is specifically that messages with a ^aRESCANNED control line should not be passed on to other links... ever... that will stop them from triggering what looks like a regurge or "dupe dump"... they will be different than the original message because of the ^aRESCANNED control line so they will not be caught by most dupe detection techniques... that's the real problem...
my point is specifically that messages with a ^aRESCANNED control line
should not be passed on to other links... ever... that will stop them
from triggering what looks like a regurge or "dupe dump"... they will
be different than the original message because of the ^aRESCANNED
control line so they will not be caught by most dupe detection
techniques... that's the real problem...
Is that true?
Synchronet/SBBSecho uses 2 methods of dupe messge detection:
1. Message-ID (in the case of FTN, that's everything between "\1MSGID: " and
the CR) - the Message-ID doesn't change when messages a re-scanned
2. Message body text (not including kludge/control lines, paths/seen-bys,
and tear/tag/origin lines)
Rescanned messages would (should) be caught as dupes just fine.
On 2018 Jun 19 11:31:08, you wrote to me:
my point is specifically that messages with a ^aRESCANNED control line
should not be passed on to other links... ever... that will stop them
from triggering what looks like a regurge or "dupe dump"... they will
be different than the original message because of the ^aRESCANNED
control line so they will not be caught by most dupe detection
techniques... that's the real problem...
Is that true?
in numerous cases, yes... but, if i want a rescan of an area that had damaged data files and i'm trying to recover the last year's messages, why should the rescanned messages be sent on to any other system? mine is the only one that wants or needs them... why should other linked systems have to do the additional work? if we just don't send ^aRESCANNED messages on to other systems, no other systems would be bothered...
Synchronet/SBBSecho uses 2 methods of dupe messge detection:
1. Message-ID (in the case of FTN, that's everything between "\1MSGID: " and
the CR) - the Message-ID doesn't change when messages a re-scanned 2. Message body text (not including kludge/control lines, paths/seen-bys,
and tear/tag/origin lines)
Rescanned messages would (should) be caught as dupes just fine.
that looks ok but not everyone goes that route with their dupe detection code...
i've seen the second one cause systems to only see, for example, the first monthly posting of something and they never see it again in any of the following months... then it is purged out of their message base and they don't have it any more and don't receive it either... maybe it is echo rules... maybe
it is a monthly PSA...
my point is specifically that messages with a ^aRESCANNED control line should not be passed on to other links... ever...
in numerous cases, yes... but, if i want a rescan of an area that had
damaged data files and i'm trying to recover the last year's messages,
why should the rescanned messages be sent on to any other system? mine
is the only one that wants or needs them... why should other linked
systems have to do the additional work? if we just don't send
^aRESCANNED messages on to other systems, no other systems would be
bothered...
I don't dispute that rescanned message shouldn't be forwarded to
downlinks and I just committed a change to SBBSecho to that effect.
(?)i've seen the second one cause systems to only see, for example, the
first monthly posting of something and they never see it again in any
of the following months... then it is purged out of their message base
and they don't have it any more and don't receive it either... maybe
it is echo rules... maybe it is a monthly PSA...
And if it's duplicate, it's a duplicate. That's why auto-posters should
put timestamps or other unique data in their message body if they really want to avoid being ignored as dupes.
But including metadata (control lines) in the dupe detection seems
like a bad approach. If message takes a different path, it'll have different metadata, but it's still a dupe (and often that's how dupes arrive, via a different path than the original).
my point is specifically that messages with a ^aRESCANNED control
line should not be passed on to other links... ever...
I realize that. I think we're on the same page and I'm just being too terse.
AFAIK, seenbys and paths are not included in most dupe detection schemes... other non-changing control lines are fine to be included... one of the problems
comes when some system sort those control lines on messages they are passing along... we don't see so much of that like we did at one time ;)
AFAIK, seenbys and paths are not included in most dupe detection
schemes... other non-changing control lines are fine to be included...
one of the problems comes when some system sort those control lines on
messages they are passing along... we don't see so much of that like we
did at one time ;)
So some metadata is included in the data that is hashed for dupe
detection and some is not?
Are you sure about that?
Anyway, duplicate Message-IDs *should* be caught be any FTN software written or updated in the past 20 years.
On 2018 Jun 19 22:43:24, you wrote to me:
AFAIK, seenbys and paths are not included in most dupe detection
schemes... other non-changing control lines are fine to be included...
one of the problems comes when some system sort those control lines on
messages they are passing along... we don't see so much of that like we
did at one time ;)
So some metadata is included in the data that is hashed for dupe detection and some is not?
yes...
Are you sure about that?
yes... in fact, and i don't recall who pointed this out to me back in the '90s,
dbridge does exactly this in a manner of speaking... it takes the whole message
header plus X bytes immediately following the message header and uses all of that as at least part of the checksum calculation... this was pointed out to me
when i was working on my posting tool and was adding MSGID support to it...
i was using a library and just letting it do its thing... some of my test posts
were reported as dupes when they clearly weren't... IIRC, they were detected as
dupes because they were posted within the same second... it turned out that my MSGID was somewhere in the middle of the control lines at the beginning of the message body and only my dbridge using testers were seeing this... someone pointed out this thing about dbridge also using X bytes from the beginning of the message body in addition to the message header so i moved my posting tool's
MSGID to the top of the list and no more dupes were detected by those dbridge systems...
i don't know what other systems do... there's only a very few that provide this
information... SBBS is one of them... when i was testing Mystic, there was some
discussion about dupe detection as james worked to try to figure out the best method he liked... i have used fastecho here for decades but i don't know what data it uses for its checksums... i do know it uses two checksums, though... i know this because i was being nosy one day and looking at FE's dupe database file (one for all message areas) with a hex viewer and noticed that groups of bytes were repeated all throughout the file... i asked about this and was told i found a bug... basically, FE has two checksums that it uses for each message and both are supposed to be stored in the database... what i found was that only one was being used and written to both fields... toby fixed that problem right quick... i just don't know what data is used to calculate them...
back in the day, dupe detection formulas were not really shared around... maybe
a couple of developers talking amongst themselves would tell each other what they were doing but this information was not published where everyone could find it... it was more or less black majik to a point...
Sysop: | digital man |
---|---|
Location: | Riverside County, California |
Users: | 1,040 |
Nodes: | 17 (0 / 17) |
Uptime: | 31:32:55 |
Calls: | 501,924 |
Calls today: | 13 |
Files: | 104,428 |
D/L today: |
29,823 files (4,019M bytes) |
Messages: | 299,110 |
Posted today: | 3 |