Showing posts with label comparison. Show all posts
Showing posts with label comparison. Show all posts

Thursday, February 16, 2012

Column Comparison

Hi,
I work at a university and am dealing with government data. We
obtain a recordset from them called ISIRS. These are in a fixed length
field format that I put into a table. This table has upwards of 450
fields (there was no better way to break it up, no relations exist).
Each student can have multiple ISIR records, each with a transaction
number always incrementing. I was hoping someone could help me out
here...
I need to be able to compare the current ISIR with the students
previous ISIRs in T-SQL. I could do this in code(.NET), but I am using
SQL Server Reports to present this data to the user and need a stored
procedure to do a column by column comparison and indicate the
differences in the row it returns.
psuedo-code:
returnRow = requested isir row
for each row in student's isirs
for each column in row's column
if returnRow[column] <> row[column] then
returnRow[column] = returnRow[column] + '#'
end if
end for each
end for each
I hope this makes sense. If we were using SQL Server 2005, I could do
this using CLR SP's, but we are on 2000.
Thanks for your help.Even with SQL Server 2005, I wouldn't reccomend doing this in a stored
procedure, becuase it would still involve hard coding the column names.
With 450 columns, it would make more sense to write a client side
application that iterates through each column programatically using the
column object reference number. The pseudo code you provided is similar to
the actual ADO.NET needed to do this.
http://support.microsoft.com/defaul...kb;en-us;310107
You said that each student can have multiple records, so you will need to
identify a primary key (ex: SSN + CourseNumber) to join each new records
with the appropriate new record.
"craiggwilson" <craiggwilson@.gmail.com> wrote in message
news:1140533890.693887.38640@.g14g2000cwa.googlegroups.com...
> Hi,
> I work at a university and am dealing with government data. We
> obtain a recordset from them called ISIRS. These are in a fixed length
> field format that I put into a table. This table has upwards of 450
> fields (there was no better way to break it up, no relations exist).
> Each student can have multiple ISIR records, each with a transaction
> number always incrementing. I was hoping someone could help me out
> here...
> I need to be able to compare the current ISIR with the students
> previous ISIRs in T-SQL. I could do this in code(.NET), but I am using
> SQL Server Reports to present this data to the user and need a stored
> procedure to do a column by column comparison and indicate the
> differences in the row it returns.
> psuedo-code:
> returnRow = requested isir row
> for each row in student's isirs
> for each column in row's column
> if returnRow[column] <> row[column] then
> returnRow[column] = returnRow[column] + '#'
> end if
> end for each
> end for each
> I hope this makes sense. If we were using SQL Server 2005, I could do
> this using CLR SP's, but we are on 2000.
> Thanks for your help.
>|||That's what I was afraid of. The front end to this is an ASP.NET /
WindowsForms application, but the report is straight from a SQL Query.
I appreciate your help.|||I have had some success generating brute force SELECTs to do this sort
of thing by writing queries against the system tables (or
INFORMATION_SCHEMA views).
Start by writing a SELECT that tests one column of the table and
returns the key and both the current and prior values. What you need
is 450 such SELECTS with just the column names changed.
So then write a SELECT against syscolumns (or
INFORMATION_SCHEMA.COLUMNS) that creates one long string matching the
SELECT you just wrote, but concatenating in the column name in place
of the one you used in the example.
Obviously this is just a starting point, but I have used such
techniques successfully.
Good luck!
Roy
On 21 Feb 2006 06:58:10 -0800, "craiggwilson" <craiggwilson@.gmail.com>
wrote:

>Hi,
> I work at a university and am dealing with government data. We
>obtain a recordset from them called ISIRS. These are in a fixed length
>field format that I put into a table. This table has upwards of 450
>fields (there was no better way to break it up, no relations exist).
>Each student can have multiple ISIR records, each with a transaction
>number always incrementing. I was hoping someone could help me out
>here...
>I need to be able to compare the current ISIR with the students
>previous ISIRs in T-SQL. I could do this in code(.NET), but I am using
>SQL Server Reports to present this data to the user and need a stored
>procedure to do a column by column comparison and indicate the
>differences in the row it returns.
>psuedo-code:
>returnRow = requested isir row
>for each row in student's isirs
> for each column in row's column
> if returnRow[column] <> row[column] then
> returnRow[column] = returnRow[column] + '#'
> end if
> end for each
>end for each
>I hope this makes sense. If we were using SQL Server 2005, I could do
>this using CLR SP's, but we are on 2000.
>Thanks for your help.

Column Comparison

I'm in need of a sql query that I'm not sure is possible. Here is an
example of how it's laid out.

employee ID Job class Last Change Date
12345 x 2/1/2004
12345 y 1/15/2004
12345 z 1/1/2004

We know that this person is in job class 'x' because it's the most
recent change. Is there a way to write a query that will exclude the
lines 'y' and 'z' because they are currently incorrect?

I would appreciate any help I could get. ThanksMatt (mjreiter@.yahoo.com) writes:
> I'm in need of a sql query that I'm not sure is possible. Here is an
> example of how it's laid out.
>
> employee ID Job class Last Change Date
> 12345 x 2/1/2004
> 12345 y 1/15/2004
> 12345 z 1/1/2004
> We know that this person is in job class 'x' because it's the most
> recent change. Is there a way to write a query that will exclude the
> lines 'y' and 'z' because they are currently incorrect?

SELECT t.empolyeeid, t.job_class, t.lastchangedate
FROM tbl t
JOIN (SELECT employeeid, lastchangedate = MAX(lastchangedate)
FROM tbl
GROUP BY employeeid) AS x
ON t.employeeid = x.employeeid
AND t.lastchangedate = m.lastchangdate

What you see in the middle is a derived table. This a very powerful
feature in SQL. It is sort of a temporary table in the middle of the
query, but it is not matierialized as such, and the optimizer may find
shortcuts so that the entire table is never computed, only what is
needed for the query.

--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

Tuesday, February 14, 2012

Column By Column Comparison

SQL Server 2000

I will to compare a normal table with a replicated audit table having
identical columns. I wish to report on the differences between the
data.

How can I loop though a "column list" in TSQL rather than explicitly
naming each column to compare ?

Thanks,
CraigDo you just want to know if there IS a difference or what sort of
difference ?

--For the Whole Table
SELECT checksum_agg(binary_checksum(*)) from SomeTable

--For just the rows
Select binary_checksum(*) from SomeTable

If you need more granualarity, then it would be more work.

HTH, Jens Suessmeyer.|||Jens,

This is not 100% reliable. If the checksum functions are
different, there must be a difference in the tables, but the
converse is not true. The checksum functions can return the
same values on different data, like any hash function.

The only sure way is to compare column by column values.

Steve Kass
Drew University

Jens wrote:
> Do you just want to know if there IS a difference or what sort of
> difference ?
> --For the Whole Table
> SELECT checksum_agg(binary_checksum(*)) from SomeTable
> --For just the rows
> Select binary_checksum(*) from SomeTable
> If you need more granualarity, then it would be more work.
> HTH, Jens Suessmeyer.|||Yes I agree with you, definitely. But as I said, to have more
granularity and more reliabilty you have to use another mechanism which
causes more coding. but as I said, I am with you.

Jens Suessmeyer.|||Thanks for the reply.

I have to interrogate each column and determine which is different and
then report on the difference. I was thinking more in the lines of
looping through the columns collection somehow without having to know
the name of each column ...

Thanks again !|||(csomberg@.dwr.com) writes:
> SQL Server 2000
> I will to compare a normal table with a replicated audit table having
> identical columns. I wish to report on the differences between the
> data.
> How can I loop though a "column list" in TSQL rather than explicitly
> naming each column to compare ?

You can get the columns for a table with

SELECT name FROM syscolumns WHERE id = object_name('tbl')

I would suggest that it is best to generate the SQL statement from client
code, since client languages are better apt for string manipulation. You
can build SQL strings in T-SQL as well, and exeute them with EXEC() or
sp_executesql, but this is bulkier.

If you want to be static, one option is to use a client-language to
generate a stored procedure.

Keep in mind that you cannot just say:

SELECT ...
FROM tbl a
JOIN audit_tbl b ON a.keycol = b.keycol
WHERE a.col1 <> b.col1
AND a.col2 <> b.col2
...

Correct is:

WHERE (a.col1 <> b.col1 OR
a.col1 IS NULL AND b.col1 IS NOT NULL OR
a.col1 IS NOT NULL AND b.col1 IS NULL)
AND

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||On Fri, 14 Oct 2005 21:51:02 +0000 (UTC), Erland Sommarskog wrote:

(snip)
>Keep in mind that you cannot just say:
> SELECT ...
> FROM tbl a
> JOIN audit_tbl b ON a.keycol = b.keycol
> WHERE a.col1 <> b.col1
> AND a.col2 <> b.col2
> ...
>Correct is:
> WHERE (a.col1 <> b.col1 OR
> a.col1 IS NULL AND b.col1 IS NOT NULL OR
> a.col1 IS NOT NULL AND b.col1 IS NULL)
> AND

Hi Erland,

This can be (slightly) simplified to

WHERE (NULLIF (a.col1, b.col1) IS NOT NULL OR
NULLIF (b.col1, a.col1) IS NOT NULL)
AND

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hugo Kornelis (hugo@.pe_NO_rFact.in_SPAM_fo) writes:
> This can be (slightly) simplified to
> WHERE (NULLIF (a.col1, b.col1) IS NOT NULL OR
> NULLIF (b.col1, a.col1) IS NOT NULL)
> AND

Simplified in the sense "few characters to type, yes". Else I'm not
sure that I find it simpler. Or it is just that it's Friday night after
a hard and long working week...

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||On Fri, 14 Oct 2005 22:14:32 +0000 (UTC), Erland Sommarskog wrote:

>Hugo Kornelis (hugo@.pe_NO_rFact.in_SPAM_fo) writes:
>> This can be (slightly) simplified to
>>
>> WHERE (NULLIF (a.col1, b.col1) IS NOT NULL OR
>> NULLIF (b.col1, a.col1) IS NOT NULL)
>> AND
>Simplified in the sense "few characters to type, yes". Else I'm not
>sure that I find it simpler. Or it is just that it's Friday night after
>a hard and long working week...

Hi Erland,

Well, okay. It's an acquired taste, I'll have to grant you that. :-)

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)|||SELECT X.a, X.b, ... X.z
FROM
(SELECT a, .b, ... z
FROM AuditData
UNION ALL
SELECT a, b, ... z
FROM BaseData)
AS X (a, b, ... z)
GROUP BY X.a, X.b, ... X.z
HAVING COUNT(*) < 2;

or if we had SQL-92 set operators, you could write:

SELECT *
FROM (SELECT 'B', * FROM BaseData)
EXCEPT
(SELECT 'A' , * FROM AuditData);

or
SELECT *
FROM (SELECT 'A', * FROM AuditData)
EXCEPT
(SELECT 'B' , * FROM BaseData);|||if the table has a PK, there is no need to do tedios column-by-colmn
comparisons:

create table t1(i int identity, j int)
insert into t1(j) values(1)
insert into t1(j) values(2)
insert into t1(j) values(3)

create table t2(i int identity, j int)
insert into t2(j) values(1)
insert into t2(j) values(2)
insert into t2(j) values(4)

-- rows in t1 that do not have exact match in t2
select * from t1 t
-- there is a row in t2 with the same PK
where exists(select * from t2 where t2.i = t.i)
-- but some other columns are different
and
(select count(*) from
(
select * from t1
union
select * from t2
)t_both where t_both.i = t.i
) = 2

-- rows in t2 that do not have exact match in t1
select * from t2 t
-- there is a row in t1 with the same PK
where exists(select * from t1 where t1.i = t.i)
-- but some other columns are different
and
(select count(*) from
(
select * from t1
union
select * from t2
)t_both where t_both.i = t.i
) = 2

drop table t1
drop table t2|||Thank you all for the updates and I am sorry I did not get back sooner.

I need to do column bu column comparisons as the "report" has to show
which data elements have changed and from what to what ... yuk.

I do not think I can do this efficiently in SQL so I was thinking of
writing a quick little app to loop through column collections of the
recordset ...