[Spamprobe-users] Lots of false positives since transferring database

My only thought is that your DB has tagged headers of emails that
indicate the target system IP or hostname. Now you've moved to a
different computer or network and maybe these new emails are thought to
be spam. Have you dumped out terms DB and manually looked at them ?

Thanks for this suggestion. You could be right, since it has only
started happening since I came to Greece and am of course using a
different ISP. I can't see anything in the headers that would cause it.
I'll have to see if it is still happening when I get back.

Anthony

--
Anthony Campbell - ***@acampbell.org.uk
Microsoft-free zone - Using Debian GNU/Linux
http://www.acampbell.org.uk (blog, book reviews,
and sceptical articles)

Victor Sudakov

2008-10-02 09:50:53 UTC

Shun the Berkeley DB if you can. I have had problems with the BDB
backend too, though of a different kind.

--
Victor Sudakov, VAS4-RIPE, VAS47-RIPN
sip:***@sibptus.tomsk.ru

Anthony Campbell

2008-10-05 05:54:27 UTC

Shun the Berkeley DB if you can. I have had problems with the BDB
backend too, though of a different kind.
--
Victor Sudakov, VAS4-RIPE, VAS47-RIPN

The thing is, I don't really know which I'm using (if any)! The database
was made on my desktop with pbl and I imported it to this laptop. But I
couldn't compile spamprobe on this computer so I used the Debian
package. The documentation for this talks about Berkeley, but I don't
have Berkeley on the machine. So in summary, I don't seem to be using
either of them, but nevertheless everything seems to be working as
expected apart from the above-mentioned problem. Once I've marked an
email as good, further emails from the same site are recognized as
good.

Does the pbl/Berkeley issue only apply when the database is originally
compiled, not subsequently?

Anthony

--
Anthony Campbell - ***@acampbell.org.uk
Microsoft-free zone - Using Debian GNU/Linux
http://www.acampbell.org.uk (blog, book reviews,
and sceptical articles)

Victor Sudakov

2008-10-05 06:24:54 UTC

Shun the Berkeley DB if you can. I have had problems with the BDB
backend too, though of a different kind.

The thing is, I don't really know which I'm using (if any)!

What do the following commands show?
ldd `which spamprobe`
ls -al ~/.spamprobe

Post by Anthony Campbell
The database
was made on my desktop with pbl and I imported it to this laptop. But I
couldn't compile spamprobe on this computer so I used the Debian
package. The documentation for this talks about Berkeley, but I don't
have Berkeley on the machine. So in summary, I don't seem to be using
either of them, but nevertheless everything seems to be working as
expected apart from the above-mentioned problem. Once I've marked an
email as good, further emails from the same site are recognized as
good.

Have you tried "spamprobe -T summarize" on the false positives?
Does the score change if you specify "-H none"?
This may give some hints about what is happening.

Post by Anthony Campbell
Does the pbl/Berkeley issue only apply when the database is originally
compiled, not subsequently?

spamprobe export/import should be independent of the database type anyway.
Did you create a new empty database before importing?

My problem with BDB was different: it just started crashing after a
certain time of use.

--
Victor Sudakov, VAS4-RIPE, VAS47-RIPN
sip:***@sibptus.tomsk.ru

David A. Lee

2008-10-05 10:09:39 UTC

Post by Victor Sudakov
My problem with BDB was different: it just started crashing after a
certain time of use.

Same here ... BDB never gave me different results then PBL, it just was
unstable and would either crash or go into an infinte loop and hang SP.

Since empty DB's typicaly dont produce false negatives (mark ham as spam) ,
my guess is still
that it has to do with header tags the ISP is or is not inserting that has
changed.

Anthony Campbell

2008-10-05 13:17:04 UTC

Post by Victor Sudakov
What do the following commands show?
ldd `which spamprobe`

***@ithaca:~$ ldd `which spamprobe`
linux-gate.so.1 => (0xb7f29000)
libdb-4.6.so => /usr/lib/libdb-4.6.so (0xb7de9000)
libgif.so.4 => /usr/lib/libgif.so.4 (0xb7de1000)
libpng12.so.0 => /usr/lib/libpng12.so.0 (0xb7dbd000)
libjpeg.so.62 => /usr/lib/libjpeg.so.62 (0xb7d9e000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7cb0000)
libm.so.6 => /lib/i686/cmov/libm.so.6 (0xb7c8a000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7c7d000)
libc.so.6 => /lib/i686/cmov/libc.so.6 (0xb7b22000)
libpthread.so.0 => /lib/i686/cmov/libpthread.so.0 (0xb7b09000)
libz.so.1 => /usr/lib/libz.so.1 (0xb7af3000)
/lib/ld-linux.so.2 (0xb7f2a000)

Post by Victor Sudakov
ls -al ~/.spamprobe

***@ithaca:~$ ls -al ~/.spamprobe
total 6824
drwxr-xr-x 2 ac ac 4096 2008-10-01 10:48 ./
drwxr-xr-x 119 ac ac 12288 2008-10-05 13:49 ../
-rw------- 1 ac ac 24576 2008-10-05 13:50 __db.001
-rw------- 1 ac ac 163840 2008-10-05 13:50 __db.002
-rw------- 1 ac ac 270336 2008-10-05 13:50 __db.003
-rw------- 1 ac ac 475136 2008-10-05 13:50 __db.004
-rw------- 1 ac ac 0 2008-10-01 10:48 lock
-rw------- 1 ac ac 6131712 2008-10-05 13:50 sp_words

Have you tried "spamprobe -T summarize" on the false positives?
Does the score change if you specify "-H none"?
This may give some hints about what is happening.

It doesn't change. I get this:

Score: 0.9999968
Spam Prob Count Good Spam Word
0.9999947 10 0 10 seller
0.9999979 5 0 25 leave
0.9999978 4 0 24 transaction
0.9999956 3 0 12 click the
0.9999990 1 0 61 U_token
0.9999987 1 0 42 U_account
0.9999985 1 0 35 b
0.9999980 2 0 27 orders
0.9999980 1 0 26 Hsubject_on
0.9999978 1 0 24 notification
0.9999977 1 0 23 node
0.9999976 1 0 22 pick
0.9999976 1 0 22 please do
0.9999967 1 0 16 U_top
0.9999956 1 0 12 giving
0.9999952 1 0 11 all orders
0.9999941 1 0 9 Hsubject_will
0.9999941 1 0 9 recent
0.9999934 1 0 8 find the
0.9999925 1 0 7 U_b
0.9999925 1 0 7 accept
0.9999925 1 0 7 will help

After tellling SP it was good, I get:

Score: 0.0000031
Spam Prob Count Good Spam Word
0.0000010 10 10 0 seller
0.0000010 4 14 0 marketplace
0.0000010 4 4 0 leave seller
0.0000010 4 4 0 rating
0.0000010 4 4 0 seller feedback
0.0000010 3 3 0 736-5850801-2237730
0.0000010 3 3 0 amazon marketplace
0.0000010 3 3 0 fb
0.0000010 3 3 0 rys
0.0000010 3 3 0 the leave
0.0000010 1 5 0 prompted
0.0000010 1 3 0 be prompted
0.0000010 1 3 0 continually
0.0018975 3 14 1 em
0.0026545 1 10 1 U_g x-locale
0.0026545 1 10 1 U_locale

Post by Anthony Campbell
Does the pbl/Berkeley issue only apply when the database is originally
compiled, not subsequently?

spamprobe export/import should be independent of the database type anyway.
Did you create a new empty database before importing?

Not as such. I just made ~/.spamprobe and copied all my spamprobe files
into that directory.

As far as I can see, SP thinks that all mail received on this computer
is spam until told otherwise.

Anthony

Post by Victor Sudakov
Spamprobe-users mailing list
https://lists.sourceforge.net/lists/listinfo/spamprobe-users

--
Anthony Campbell - ***@acampbell.org.uk
Microsoft-free zone - Using Debian GNU/Linux
http://www.acampbell.org.uk (blog, book reviews,
and sceptical articles)

Chris Ross

2008-10-05 16:03:44 UTC

Post by Victor Sudakov
Have you tried "spamprobe -T summarize" on the false positives?
Does the score change if you specify "-H none"?
This may give some hints about what is happening.

Score: 0.9999968
Spam Prob Count Good Spam Word
0.9999947 10 0 10 seller
0.9999979 5 0 25 leave
0.9999978 4 0 24 transaction
Score: 0.0000031
Spam Prob Count Good Spam Word
0.0000010 10 10 0 seller
0.0000010 4 14 0 marketplace
0.0000010 4 4 0 leave seller
0.0000010 4 4 0 rating

Post by Victor Sudakov
spamprobe export/import should be independent of the database type anyway.
Did you create a new empty database before importing?

Not as such. I just made ~/.spamprobe and copied all my spamprobe files
into that directory.
As far as I can see, SP thinks that all mail received on this computer
is spam until told otherwise.

More than that, it thinks each of the *terms* is spam until told
otherwise. The normal export/import process would suggest you run
export on the machine you're moving away from, then run the import of
the new machine. This will take care of any binary incompatibility
issues. This almost looks like a byte-ordering problem to me. Maybe
a 32-bit vs. 64-bit inconsistency?

What architectures and operating systems are you running on each
machine? Do you have the opportunity to perform an export on the
original machine, and an import (into a clean .spamprobe) on the
destination machine? I suspect that might resolve the situation.

- Chris

Anthony Campbell

2008-10-05 20:14:53 UTC

Post by Chris Ross

Post by Victor Sudakov
Have you tried "spamprobe -T summarize" on the false positives?
Does the score change if you specify "-H none"?
This may give some hints about what is happening.

Post by Victor Sudakov
spamprobe export/import should be independent of the database type anyway.
Did you create a new empty database before importing?

Not as such. I just made ~/.spamprobe and copied all my spamprobe files
into that directory.
As far as I can see, SP thinks that all mail received on this computer
is spam until told otherwise.

More than that, it thinks each of the *terms* is spam until told
otherwise. The normal export/import process would suggest you run
export on the machine you're moving away from, then run the import of
the new machine. This will take care of any binary incompatibility
issues. This almost looks like a byte-ordering problem to me. Maybe a
32-bit vs. 64-bit inconsistency?
What architectures and operating systems are you running on each
machine? Do you have the opportunity to perform an export on the
original machine, and an import (into a clean .spamprobe) on the
destination machine? I suspect that might resolve the situation.
- Chris

I can do this when I get home but at present I am abroad. This is on a
Thinkpad Z61M running Debian (unstable). The machine at home is also
i386, Athlon processor, running the same software. I've only installed
32-bit stuff.

I will try your suggestion when I get back. It's annoying that my
attempt to compile SP on this machine doesn't work. Not sure why --
something to do with the gcc version I think.

Anthony

--
Anthony Campbell - ***@acampbell.org.uk
Microsoft-free zone - Using Debian GNU/Linux
http://www.acampbell.org.uk (blog, book reviews,
and sceptical articles)

Victor Sudakov

2008-10-07 08:29:35 UTC

Post by Victor Sudakov
What do the following commands show?
ldd `which spamprobe`

linux-gate.so.1 => (0xb7f29000)
libdb-4.6.so => /usr/lib/libdb-4.6.so (0xb7de9000)

^^^^^^^^^^^^^^^^^^^^^^^

[dd]

Post by Victor Sudakov
ls -al ~/.spamprobe

total 6824
drwxr-xr-x 2 ac ac 4096 2008-10-01 10:48 ./
drwxr-xr-x 119 ac ac 12288 2008-10-05 13:49 ../
-rw------- 1 ac ac 24576 2008-10-05 13:50 __db.001
-rw------- 1 ac ac 163840 2008-10-05 13:50 __db.002
-rw------- 1 ac ac 270336 2008-10-05 13:50 __db.003
-rw------- 1 ac ac 475136 2008-10-05 13:50 __db.004
-rw------- 1 ac ac 0 2008-10-01 10:48 lock
-rw------- 1 ac ac 6131712 2008-10-05 13:50 sp_words

This must be BDB.

[dd]

Post by Victor Sudakov
spamprobe export/import should be independent of the database type anyway.
Did you create a new empty database before importing?

Not as such. I just made ~/.spamprobe and copied all my spamprobe files
into that directory.

Can you "rm -rf ~/.spamprobe" on your laptop and transfer the database
via export/import?

Anthony Campbell

2008-10-12 08:28:35 UTC

Post by Victor Sudakov
Can you "rm -rf ~/.spamprobe" on your laptop and transfer the database
via export/import?

That is what I did but it made no difference.

David A. Lee

2008-10-02 11:53:09 UTC