Sunday, March 25, 2012

Combining information from multiple records into one

Hi,

I am trying to combine information from two or more records into one and I am completely stuck on a solution for my problem so I hope there is someone out there who can help me.

My table looks like this:
ID - DayNr - Transportation - TransOrder - Route
25 - 1 - Car - 1 - Text A
25 - 1 - Train - 1 - Text B
25 - 1 - Train - 2 - Text C
25 - 7 - Train - 1 - Text D
25 - 7 - Train - 2 - Text E

I want to combine all Route - information belonging to the same combination of ID & DayNr & Transportation into one new record. The result should look like:

Column 1 - Column 2
25/1/Car - Text A
25/1/Train - TextB;TextC
25/7/Train - TextD;TextE

I have tried Coalesce-statements and Cursor-solutions but until now everything I tried didn't work. Ideas anyone?

Thanks.
RMG

P.S. ID is not my primary key and doesn't have to be uniqueYou can use a simple while loop to iterate over the record and insert the records into an output table. If the values for the combination of columns in your source set changes, then insert a new row into the table, and if the values are identical, then append the value from the other column to the corresponding column's value in the new table.

create a table to represent the output
select source data from the database

while (more records)
{

if the current column combination is different
to the previous one, insert a new row
into the output table.

if the current column combination is equal
to the previous one, append the value of
columnX to the corresponding value of the
current row in the output table.

move to next record;

}

display the output to the user, for to not do so,
would defeat the purpose of writing the function.

Send an email to your director of IT asking why
such a function was even requested ;)

Regards,|||This looks completely logically to me. But it would be very helpfull if you could translate some of this logic into SQL for me. Because that is where I have the problem, not so much the necessary steps I need to follow.

And I am afraid I have to ask the last question to myself. As I am the one who wants to concatenate the information and put it in another table. And the answer is quite simple: This way I don't have to copy and paste information from thousands of records by hand. ;)|||Three options.

Robert's way.
UDF to which you pass the data that uniquely identifies your output row and that returns string of delimited values.
If there are a finite (and manageable) number of values for Route then a CASE statement (note- this will give you n columns for Route where n is the number of unique values).|||I was going to suggest the third option and provide an example using the CASE statement, but I wasn't certain if the number of rows to be transformed, and thus the number of columns required to produce, would be of a small enough size to make this approach practical. I believe you would need to code one case statement per possible column. For this to work you will need to assign some kind of unique numbering to the rows, so that each case statement to represent a column, knows what row to extract. Also, you will need to collapse the rows after applying the concatenation of your new columns, populated by the CASE statement, in order to remove the gaps.

Taking these factors into consideration, I felt that the iterative approach I described earlier would provide you with the simplest and quickest implementation.

Though of course, they are only the considerations that I was able to ascertain from reading your post.|||Also, you will need to collapse the rows after applying the concatenation of your new columns, populated by the CASE statement, in order to remove the gaps.Nah - just use MAX() - I think the BoL 2000 CASE entry demonstrates exactly this. Thinking about it, I guess one might use PIVOT in 2005 in lieu of the CASE statements.|||Wouldn't you still need to collapse the rows after having projected the MAX() values of each column, for example by using the GROUP BY.|||Well yes.

SELECT col_1
, col_2
, something = MAX(CASE WHEN [route] = 'something' THEN [route] END)
, anotherThing = MAX(CASE WHEN [route] = 'anotherThing' THEN [route] END)
, andSoOn = MAX(CASE WHEN [route] = 'andSoOn' THEN [route] END)
FROM dbo.mytable
GROUP BY col_1
, col_2|||I didn't intend to be pedantic, I just thought the poster should be aware that the result set would need to be "flattened".|||Ah beg your pardon - I wasn't sure whether or not you were asking questions on your behalf or the OPs. I thought your range extended well beyond this :).|||Select Col1+col2 As Merged_col
From ...

?|||Select Col1+col2 As Merged_col
From ...

?That handles Column 1 in the OPs desired results but look more carefully at column 2 - that's the tricky bit :)|||I think you will be able to get the result set you want doing something like this.

CREATE PROC sp_TrickySelect
AS
DECLARE @.str varchar(8000), @.tot int, @.l int, @.search varchar(1000), @.select varchar(8000), @.count int, @.l2 int, @.str_part varchar(1000)

SELECT ([ID]+'/'[DayNr]+'/'+[Transportation]) AS [Column1], [TransOrder], [Route]
INTO ##tmpTbl1
FROM ....

SELECT DISTINCT([Column1])
INTO ##tmpTbl2
FROM ##tmpTbl1

SET @.tot = SELECT COUNT(*) FROM ##tmpTbl2
SET @.l = 1

WHILE (@.l<=@.tot)
BEGIN
SET @.search = SELECT TOP 1 [Column1] FROM ##tmpTbl2
SET @.count = SELECT COUNT([Route]) FROM ##tmpTbl1 WHERE [Column1]=@.search
SET @.l2 = 1
WHILE (@.l2<=@.count)
BEGIN
SELECT @.str_part = SELECT TOP 1 [Route] FROM ##tmpTbl1 WHERE [Column1]=@.search ORDER BY [TransOrder] ASC
SELECT @.str = @.str+' '+@.str_part
DELETE ##tmpTbl1 WHERE [Column1]=@.search AND [TransOrder]=(SELECT MIN [TransOrder] FROM ##tmpTbl1 WHERE [Column1]=@.search)
SET @.l2 = @.l2 +1
END
INSERT INTO tmpTbl ([Column1], [Column2]) VALUES (@.search,@.str)
DELETE FROM ##tmpTbl2 WHERE [Column1]=@.search
SET @.l = @.l+1
END

SELECT [Column1], [Column2] FROM tmpTbl

NB: I wrote this directly on this website and I didn't test it.

Hope it can helps you out.

Regards

Or Tho|||If you use the Code from my previous post you gonna have to create a table tmpTable at the begining of the proc and drop it at the end OR create it and TRUNCATE it at the begining of the proc...|||I am compelled to vehemently advise against using a solution similar to the one described above. Of course, these views are intended to discuss the merits of the solution, and in no way are to be interpreted as being a reference to the author.

Ortho,

While your approach may indeed work, in comparing it with other solutions available, namely those developed in SQL, it is unnecessarily complex and bloated.

Not only is the length of your solution a reason to outright dismiss it, but it also consumes a large number of resources and the use of programming constructs that are disproportionate to the complexity of the problem. These include temp tables, while loops, conditional constructs, string manipulation functions, and finally, individual SQL statements.

Your approach is almost exclusively procedural in nature and as such, should be considered only, and only when ,a more elegant and often performance friendly set based solution is not available. Often an experienced SQL Developer can develop a solution using many times fewer resources and lines of code than what would otherwise b developed by a procedural coder. This is not to say that one skill set is more valuable than the other, but instead it serves to highlight the difference present in the mindsets of these two developers, and how their differing perspectives are suited for specific kinds of problems.

Generally procedural coders find it immensely difficult to develop efficient code for the manipulation and retrieval of data, in other words, developing code that works with data. I know that myself, I find it often difficult to see the benefits in OO programming for anything related to data. It's just a different way of viewing a problem.

As you can see by comparing your solution to that posted earlier in this thread, the number of explicitly programmed steps is much greater in your solution. This can increase the risk of errors being introduced during development and maintenance of the code, which is obviously, is a risk that developers and managers should strive to minimize.

Finally, your solution ignores the intent of the family of languages (unfortunately I cannot remember the exact term at the time of writing), of which SQL is indeed widely known, in developing a level of abstraction between the intent of a function and the internal representation of how it will be performed. In other words, the goal of SQL and other similar languages, is to focus on expressing the problem in terms of what needs to be done and not so much on how to do it.

To apply these principles to the problem posted by the original poster, we can see that an SQL solution can be developed in only a fraction of the lines of code and with no explicit declarations of variables or inclusion of procedural programming constructs.

Regards,|||r123456, I know my solution isn't the best but it's the only way I found to get the result set he wants...

Anyways thanks for the advise!

Peace

Or|||As I said, the intent of the post was to make the original poster aware that in that instance, a procedural approach was not the optimal solution, and instead a set based solution would be more appropriate.

No comments:

Post a Comment