Discussion:
[Spamprobe-users] Lots of false positives since transferring database
Anthony Campbell
2008-10-01 12:42:28 UTC
Permalink
I copied my database from my desktop to a laptop because I am away from
home for 3 weeks and need to download and check my mail.

Spamprobe is now classifying practically everything as spam. It's not a
major problem (I'm simply checking the spam folder and retraining on the
ham) but any idea what could have happened here?

I tried exporting/importing the database but no effect. I think I may
now be using Berkeley instead of pbl - would that make a difference?
(This is because the downloaded version wouldn't compile, for some
reason, so I'm using the packaged version from Debian Sid.)

Anthony
--
Anthony Campbell - ***@acampbell.org.uk
Microsoft-free zone - Using Debian GNU/Linux
http://www.acampbell.org.uk (blog, book reviews,
and sceptical articles)
Anthony Campbell
2008-10-02 17:34:33 UTC
Permalink
My only thought is that your DB has tagged headers of emails that
indicate the target system IP or hostname. Now you've moved to a
different computer or network and maybe these new emails are thought to
be spam. Have you dumped out terms DB and manually looked at them ?
Thanks for this suggestion. You could be right, since it has only
started happening since I came to Greece and am of course using a
different ISP. I can't see anything in the headers that would cause it.
I'll have to see if it is still happening when I get back.



Anthony
--
Anthony Campbell - ***@acampbell.org.uk
Microsoft-free zone - Using Debian GNU/Linux
http://www.acampbell.org.uk (blog, book reviews,
and sceptical articles)
Victor Sudakov
2008-10-02 09:50:53 UTC
Permalink
Post by Anthony Campbell
I copied my database from my desktop to a laptop because I am away from
home for 3 weeks and need to download and check my mail.
Spamprobe is now classifying practically everything as spam. It's not a
major problem (I'm simply checking the spam folder and retraining on the
ham) but any idea what could have happened here?
I tried exporting/importing the database but no effect. I think I may
now be using Berkeley instead of pbl - would that make a difference?
Shun the Berkeley DB if you can. I have had problems with the BDB
backend too, though of a different kind.
--
Victor Sudakov, VAS4-RIPE, VAS47-RIPN
sip:***@sibptus.tomsk.ru
Anthony Campbell
2008-10-05 05:54:27 UTC
Permalink
Post by Victor Sudakov
Post by Anthony Campbell
I copied my database from my desktop to a laptop because I am away from
home for 3 weeks and need to download and check my mail.
Spamprobe is now classifying practically everything as spam. It's not a
major problem (I'm simply checking the spam folder and retraining on the
ham) but any idea what could have happened here?
I tried exporting/importing the database but no effect. I think I may
now be using Berkeley instead of pbl - would that make a difference?
Shun the Berkeley DB if you can. I have had problems with the BDB
backend too, though of a different kind.
--
Victor Sudakov, VAS4-RIPE, VAS47-RIPN
The thing is, I don't really know which I'm using (if any)! The database
was made on my desktop with pbl and I imported it to this laptop. But I
couldn't compile spamprobe on this computer so I used the Debian
package. The documentation for this talks about Berkeley, but I don't
have Berkeley on the machine. So in summary, I don't seem to be using
either of them, but nevertheless everything seems to be working as
expected apart from the above-mentioned problem. Once I've marked an
email as good, further emails from the same site are recognized as
good.

Does the pbl/Berkeley issue only apply when the database is originally
compiled, not subsequently?

Anthony
--
Anthony Campbell - ***@acampbell.org.uk
Microsoft-free zone - Using Debian GNU/Linux
http://www.acampbell.org.uk (blog, book reviews,
and sceptical articles)
Victor Sudakov
2008-10-05 06:24:54 UTC
Permalink
Post by Anthony Campbell
Post by Victor Sudakov
Post by Anthony Campbell
I copied my database from my desktop to a laptop because I am away from
home for 3 weeks and need to download and check my mail.
Spamprobe is now classifying practically everything as spam. It's not a
major problem (I'm simply checking the spam folder and retraining on the
ham) but any idea what could have happened here?
I tried exporting/importing the database but no effect. I think I may
now be using Berkeley instead of pbl - would that make a difference?
Shun the Berkeley DB if you can. I have had problems with the BDB
backend too, though of a different kind.
The thing is, I don't really know which I'm using (if any)!
What do the following commands show?
ldd `which spamprobe`
ls -al ~/.spamprobe
Post by Anthony Campbell
The database
was made on my desktop with pbl and I imported it to this laptop. But I
couldn't compile spamprobe on this computer so I used the Debian
package. The documentation for this talks about Berkeley, but I don't
have Berkeley on the machine. So in summary, I don't seem to be using
either of them, but nevertheless everything seems to be working as
expected apart from the above-mentioned problem. Once I've marked an
email as good, further emails from the same site are recognized as
good.
Have you tried "spamprobe -T summarize" on the false positives?
Does the score change if you specify "-H none"?
This may give some hints about what is happening.
Post by Anthony Campbell
Does the pbl/Berkeley issue only apply when the database is originally
compiled, not subsequently?
spamprobe export/import should be independent of the database type anyway.
Did you create a new empty database before importing?

My problem with BDB was different: it just started crashing after a
certain time of use.
--
Victor Sudakov, VAS4-RIPE, VAS47-RIPN
sip:***@sibptus.tomsk.ru
David A. Lee
2008-10-05 10:09:39 UTC
Permalink
Post by Victor Sudakov
My problem with BDB was different: it just started crashing after a
certain time of use.
Same here ... BDB never gave me different results then PBL, it just was
unstable and would either crash or go into an infinte loop and hang SP.

Since empty DB's typicaly dont produce false negatives (mark ham as spam) ,
my guess is still
that it has to do with header tags the ISP is or is not inserting that has
changed.
Anthony Campbell
2008-10-05 13:17:04 UTC
Permalink
Post by Victor Sudakov
What do the following commands show?
ldd `which spamprobe`
***@ithaca:~$ ldd `which spamprobe`
linux-gate.so.1 => (0xb7f29000)
libdb-4.6.so => /usr/lib/libdb-4.6.so (0xb7de9000)
libgif.so.4 => /usr/lib/libgif.so.4 (0xb7de1000)
libpng12.so.0 => /usr/lib/libpng12.so.0 (0xb7dbd000)
libjpeg.so.62 => /usr/lib/libjpeg.so.62 (0xb7d9e000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7cb0000)
libm.so.6 => /lib/i686/cmov/libm.so.6 (0xb7c8a000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7c7d000)
libc.so.6 => /lib/i686/cmov/libc.so.6 (0xb7b22000)
libpthread.so.0 => /lib/i686/cmov/libpthread.so.0 (0xb7b09000)
libz.so.1 => /usr/lib/libz.so.1 (0xb7af3000)
/lib/ld-linux.so.2 (0xb7f2a000)
Post by Victor Sudakov
ls -al ~/.spamprobe
***@ithaca:~$ ls -al ~/.spamprobe
total 6824
drwxr-xr-x 2 ac ac 4096 2008-10-01 10:48 ./
drwxr-xr-x 119 ac ac 12288 2008-10-05 13:49 ../
-rw------- 1 ac ac 24576 2008-10-05 13:50 __db.001
-rw------- 1 ac ac 163840 2008-10-05 13:50 __db.002
-rw------- 1 ac ac 270336 2008-10-05 13:50 __db.003
-rw------- 1 ac ac 475136 2008-10-05 13:50 __db.004
-rw------- 1 ac ac 0 2008-10-01 10:48 lock
-rw------- 1 ac ac 6131712 2008-10-05 13:50 sp_words
Post by Victor Sudakov
Post by Anthony Campbell
The database
was made on my desktop with pbl and I imported it to this laptop. But I
couldn't compile spamprobe on this computer so I used the Debian
package. The documentation for this talks about Berkeley, but I don't
have Berkeley on the machine. So in summary, I don't seem to be using
either of them, but nevertheless everything seems to be working as
expected apart from the above-mentioned problem. Once I've marked an
email as good, further emails from the same site are recognized as
good.
Have you tried "spamprobe -T summarize" on the false positives?
Does the score change if you specify "-H none"?
This may give some hints about what is happening.
It doesn't change. I get this:

Score: 0.9999968
Spam Prob Count Good Spam Word
0.9999947 10 0 10 seller
0.9999979 5 0 25 leave
0.9999978 4 0 24 transaction
0.9999956 3 0 12 click the
0.9999990 1 0 61 U_token
0.9999987 1 0 42 U_account
0.9999985 1 0 35 b
0.9999980 2 0 27 orders
0.9999980 1 0 26 Hsubject_on
0.9999978 1 0 24 notification
0.9999977 1 0 23 node
0.9999976 1 0 22 pick
0.9999976 1 0 22 please do
0.9999967 1 0 16 U_top
0.9999956 1 0 12 giving
0.9999952 1 0 11 all orders
0.9999941 1 0 9 Hsubject_will
0.9999941 1 0 9 recent
0.9999934 1 0 8 find the
0.9999925 1 0 7 U_b
0.9999925 1 0 7 accept
0.9999925 1 0 7 will help


After tellling SP it was good, I get:

Score: 0.0000031
Spam Prob Count Good Spam Word
0.0000010 10 10 0 seller
0.0000010 4 14 0 marketplace
0.0000010 4 4 0 leave seller
0.0000010 4 4 0 rating
0.0000010 4 4 0 seller feedback
0.0000010 3 3 0 736-5850801-2237730
0.0000010 3 3 0 amazon marketplace
0.0000010 3 3 0 fb
0.0000010 3 3 0 rys
0.0000010 3 3 0 the leave
0.0000010 1 5 0 prompted
0.0000010 1 3 0 be prompted
0.0000010 1 3 0 continually
0.0018975 3 14 1 em
0.0026545 1 10 1 U_g x-locale
0.0026545 1 10 1 U_locale
Post by Victor Sudakov
Post by Anthony Campbell
Does the pbl/Berkeley issue only apply when the database is originally
compiled, not subsequently?
spamprobe export/import should be independent of the database type anyway.
Did you create a new empty database before importing?
Not as such. I just made ~/.spamprobe and copied all my spamprobe files
into that directory.


As far as I can see, SP thinks that all mail received on this computer
is spam until told otherwise.

Anthony
Post by Victor Sudakov
Spamprobe-users mailing list
https://lists.sourceforge.net/lists/listinfo/spamprobe-users
--
Anthony Campbell - ***@acampbell.org.uk
Microsoft-free zone - Using Debian GNU/Linux
http://www.acampbell.org.uk (blog, book reviews,
and sceptical articles)
Chris Ross
2008-10-05 16:03:44 UTC
Permalink
Post by Anthony Campbell
Post by Victor Sudakov
Have you tried "spamprobe -T summarize" on the false positives?
Does the score change if you specify "-H none"?
This may give some hints about what is happening.
Score: 0.9999968
Spam Prob Count Good Spam Word
0.9999947 10 0 10 seller
0.9999979 5 0 25 leave
0.9999978 4 0 24 transaction
Score: 0.0000031
Spam Prob Count Good Spam Word
0.0000010 10 10 0 seller
0.0000010 4 14 0 marketplace
0.0000010 4 4 0 leave seller
0.0000010 4 4 0 rating
Post by Victor Sudakov
spamprobe export/import should be independent of the database type anyway.
Did you create a new empty database before importing?
Not as such. I just made ~/.spamprobe and copied all my spamprobe files
into that directory.
As far as I can see, SP thinks that all mail received on this computer
is spam until told otherwise.
More than that, it thinks each of the *terms* is spam until told
otherwise. The normal export/import process would suggest you run
export on the machine you're moving away from, then run the import of
the new machine. This will take care of any binary incompatibility
issues. This almost looks like a byte-ordering problem to me. Maybe
a 32-bit vs. 64-bit inconsistency?

What architectures and operating systems are you running on each
machine? Do you have the opportunity to perform an export on the
original machine, and an import (into a clean .spamprobe) on the
destination machine? I suspect that might resolve the situation.

- Chris
Anthony Campbell
2008-10-05 20:14:53 UTC
Permalink
Post by Chris Ross
Post by Anthony Campbell
Post by Victor Sudakov
Have you tried "spamprobe -T summarize" on the false positives?
Does the score change if you specify "-H none"?
This may give some hints about what is happening.
Score: 0.9999968
Spam Prob Count Good Spam Word
0.9999947 10 0 10 seller
0.9999979 5 0 25 leave
0.9999978 4 0 24 transaction
Score: 0.0000031
Spam Prob Count Good Spam Word
0.0000010 10 10 0 seller
0.0000010 4 14 0 marketplace
0.0000010 4 4 0 leave seller
0.0000010 4 4 0 rating
Post by Victor Sudakov
spamprobe export/import should be independent of the database type anyway.
Did you create a new empty database before importing?
Not as such. I just made ~/.spamprobe and copied all my spamprobe files
into that directory.
As far as I can see, SP thinks that all mail received on this computer
is spam until told otherwise.
More than that, it thinks each of the *terms* is spam until told
otherwise. The normal export/import process would suggest you run
export on the machine you're moving away from, then run the import of
the new machine. This will take care of any binary incompatibility
issues. This almost looks like a byte-ordering problem to me. Maybe a
32-bit vs. 64-bit inconsistency?
What architectures and operating systems are you running on each
machine? Do you have the opportunity to perform an export on the
original machine, and an import (into a clean .spamprobe) on the
destination machine? I suspect that might resolve the situation.
- Chris
I can do this when I get home but at present I am abroad. This is on a
Thinkpad Z61M running Debian (unstable). The machine at home is also
i386, Athlon processor, running the same software. I've only installed
32-bit stuff.

I will try your suggestion when I get back. It's annoying that my
attempt to compile SP on this machine doesn't work. Not sure why --
something to do with the gcc version I think.

Anthony
--
Anthony Campbell - ***@acampbell.org.uk
Microsoft-free zone - Using Debian GNU/Linux
http://www.acampbell.org.uk (blog, book reviews,
and sceptical articles)
Victor Sudakov
2008-10-07 08:29:35 UTC
Permalink
Post by Anthony Campbell
Post by Victor Sudakov
What do the following commands show?
ldd `which spamprobe`
linux-gate.so.1 => (0xb7f29000)
libdb-4.6.so => /usr/lib/libdb-4.6.so (0xb7de9000)
^^^^^^^^^^^^^^^^^^^^^^^

[dd]
Post by Anthony Campbell
Post by Victor Sudakov
ls -al ~/.spamprobe
total 6824
drwxr-xr-x 2 ac ac 4096 2008-10-01 10:48 ./
drwxr-xr-x 119 ac ac 12288 2008-10-05 13:49 ../
-rw------- 1 ac ac 24576 2008-10-05 13:50 __db.001
-rw------- 1 ac ac 163840 2008-10-05 13:50 __db.002
-rw------- 1 ac ac 270336 2008-10-05 13:50 __db.003
-rw------- 1 ac ac 475136 2008-10-05 13:50 __db.004
-rw------- 1 ac ac 0 2008-10-01 10:48 lock
-rw------- 1 ac ac 6131712 2008-10-05 13:50 sp_words
This must be BDB.

[dd]
Post by Anthony Campbell
Post by Victor Sudakov
spamprobe export/import should be independent of the database type anyway.
Did you create a new empty database before importing?
Not as such. I just made ~/.spamprobe and copied all my spamprobe files
into that directory.
Can you "rm -rf ~/.spamprobe" on your laptop and transfer the database
via export/import?
Anthony Campbell
2008-10-12 08:28:35 UTC
Permalink
Post by Victor Sudakov
Can you "rm -rf ~/.spamprobe" on your laptop and transfer the database
via export/import?
That is what I did but it made no difference.
David A. Lee
2008-10-02 11:53:09 UTC
Permalink
My only thought is that your DB has tagged headers of emails that indicate
the target system IP or hostname. Now you've moved to a different computer
or network and maybe these new emails are thought to be spam. Have you
dumped out terms DB and manually looked at them ?
Post by Anthony Campbell
I copied my database from my desktop to a laptop because I am away from
home for 3 weeks and need to download and check my mail.
Spamprobe is now classifying practically everything as spam. It's not a
major problem (I'm simply checking the spam folder and retraining on the
ham) but any idea what could have happened here?
I tried exporting/importing the database but no effect. I think I may
now be using Berkeley instead of pbl - would that make a difference?
(This is because the downloaded version wouldn't compile, for some
reason, so I'm using the packaged version from Debian Sid.)
Anthony
--
Microsoft-free zone - Using Debian GNU/Linux
http://www.acampbell.org.uk (blog, book reviews,
and sceptical articles)
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge
Build the coolest Linux based applications with Moblin SDK & win great
prizes
Grand prize is a trip for two to an Open Source event anywhere in the
world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Spamprobe-users mailing list
https://lists.sourceforge.net/lists/listinfo/spamprobe-users
Loading...